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CHAPTER 1 

INTRODUCTION & BASIC CONCEPTS 


1.0 


INTRODUCTION : 

This chapter of the thesis is devoted to Basic 
Concepts of Reliability, Software Reliability, Bay si an 
inference and proportional hazard models. Reliability is an 
important consideration in the planning design and operation 
of the systems. The reliability theory is concerned with 
random occurrence of undesirable events or failures during 
the life time of a physical system or biological system. 
Reliability is an inherent attribute of a system just as the 
system capacity or power rating. The concept of reliability 
has been known for a number of years but has got greater 
significance and importance during the past decade 
particularly due to the impact of automation, development of 
complex missile and space programmes. In recent years the 
concept of reliability has been formulated as the science of 
predicting, estimating or optimizing the probability of 
survival, the mean life or more generally the life 
distr'ibulion of components or systems. As a result of 
advancement of science today man has to his credit so many 
sophisticated systems which are fully designed by his hands 
and brain, e.g. Television System, Computer System, Electric 


Power- Supply System etc are 


some man made system* laeneraliy 
when we perform the life testing experiment «with man made 
systems, we call it "Reliability Analysis" while on the 
other hand when we deal with god made systems like Human 
Body System., Isieather Changing System, Solar Energy *jysteffl 
etc, we name it "Survival Analysis". Reliability and 
St'’’'vivsi are interchange terms Reliability function and 
S>j.-‘viv3l functior? are represented by RCtJ and S^t) 
respectively. Hoxford (1960) defined the reliability 

through the concept of dependability as the probability that 
the system will be able to operate when needful, while 
according to Bazovsky (1961) the reliability is a yardstick 


of the capability of an equipment to operate without failure 
when put into service. Reliability in a simplest form means 
the probability that a failure may not occur in a given 
time interval. Reliability of a component or a system is the 
probability that the components performs its intended 
function adequately for a specified period of time under 

the stated operating conditions or environment. In other 
words, Reliability Engineering , is a branch of Science deals 
with the life testing experiment with engineering systems 
like, Radio, Television System, Electric Power Supply 
Systems and Computer System etc^ 


11 : RELIABILITY FUNCTION : 


If T is a random variable^ denotes the time to 
failure of component then the probability that it will not 
fail in a given envi rooment before time t and thus 
reliability function can be i-tiritten as 
K C t ) =• F C T 1 2 = 1 *“ PETS" 1 3 
^ 1 - R(t) 

= i - FCt) = F Ct) Cl *1) 

inhere FCt) is the cumulative distribution function Cc»,d»f«) 
of T called unreliability RCt) of the component so that 

RCt) Hh RCt) = 1 

Thus the reliability is a function of time and depends on 

environmental conditions which may or may not vary with 

time*. Since the reliability of a unit is a probability^ its 

numerical value always lies between 0 and 1« 

lim RCt) = 1 and lim RCt) = O 

i-»o i-»00 

12 : FAILURE TIME DISTRIBUTION : 

Let T be a non negative randoiii variable 
representing the failure time of an individual form a 
hiGinageneous population* The probability distribut ion of T 
can be specified in many ways, three of which are 
particularly useful in survival applications The Survivor 
Function^ The Probability Density Function and The Hazard 
Function*. Interrelat ions between these three representations 


11 : RELIABILITY FUNCTION : 


If T is a random variable, denotes the time to 
failure of component then the probability that it will not 
fail in a given environment before time t and tliLis 
reliability function can be written as 
RCt) = PUT > tl = 1 - PUT < t3 
=== i - RCt) 

i - FCt) = F Ct) « Ci*i) 

where FCt) is the cumulative distribution function Cc*d.f«) 
of T called unrel iabi 1 i ty RCt) of the component so that 

R(t) + R<t) = 1 

Thus the reliability is a function of time and depends on 

environmental conditions which may or may not vary with 

time* Since the reliability of a unit is a probability, its 

numerical value always lies between 0 and i« 

lim RCt) == i and iim RCt) = O 

t-^o t-^oo 

1.2 : FAILURE TIME DISTRIBUTION : 

Let T be a non negative random variable 
representing the failure time of an individual form a 
homogeneous population. The probability distribution of T 
caO' be specified^ in many ways, three of which are 
particularly usefsul in survival applications The Survivor 
Function, The Probability Density Function and The Hazard 
Function. Interrelat ions between these three representations 


are given below 


for both 


discrete 


and continuous 


distributions. The survivor function is defined for both 
discrete and continuous distribution as the probability that 
T is at least as great as a value t, that is 

set) = P(T > t), 0<t<<» (1.2) 

1.2.1 CASE I - ABSOLUTELY CONTINUOUS s The probability density 
function <p.d.f.) of T is 

pet < T < t + At) 


/et) =*Li(n 
^ Ai->o 


At 


-ds e t ) 

dt 


ei.3) 


Conversely, the Survivor Function SCt) given as 

00 

set) = J /es) ds 

t 


and 


CO 

/et) > 0 with J /(t) dt » 1 

o 

The range of T is e0,oo) and this should be understood as the 
domain of definition for function of t. 

The hazard function specifies the instantaneous rate of 
failure at eT=t) conditional upon survival to time t and is 
defined as 


where hit) 


het) 


.. Pet<T<t+AtT>t) 

^ALim 

set) 


specifies the distribution of T, since from 

h(t) . - d l°9S(t) 


ei.4) 

ei.4) 


so that on integrating and using SCO) 


1, we get. 


set) - eKp|^ ■“ J hCu) J 
The p*d«f« of T can be i^ritten 

/(t) == hCt) exp J hCu) du j 


Cl. 5) 


( 1 , 6 ) 


From epilation Ci«55.j we can say that hCt) is a non—negat i ve 
function with 


J h Cii) dii < 00 
o 

for some s > 0^ J hCu) du 


00 


The expected residual life at time t is given by 

= £C(T - t) jT > tl; 0 < t < 00 Cl. 7) 
which uniquely determines a continuous survival distribution 
with finite mean, since 


00 


Cu-t) /(u) 


“ J set) 


du 


^ r, I X 

f b C u ) , 

i ““sTtT 

t 

On integrating by parts, we have 

CO 

1 _ d_ logj SCu) du 

r ( t ) dt t 

Substituting t=0 in (i«8), we get 

CO 


Ci«S) 


Cl«9) 


rCO) = J S(u) du 


and 


J 


du 
r (u> 


00 


-log J S<u) du + logrtO) 

i 


5 


which leads finally to 


r(0) 

= FTtT 


r ^ 

- J 

v ^ 


dll 


r Cu> 


ci«iO) 


1«2«2 CASE— 1 1 WHEN T IS DISCRETE s If T is a discrete taking 


values 


with associated probability function 


piKj = P C* 


K.M 


< i = i , 2 , 3 , 


then the survivor function is 

set) 


= > P<X .> 

ji4t ^ 

S p(K,) H<x-t) 

L j j 


. ci.ii) 

. Ci .12) 


where HCx) is Heaviside functiori 


H(k) 


1. 


< 0 
> o 


The hazard at k. is defined as the conditional probability 


of failure at k., 
j 


h. = PCT = X !T >' 

j .) ^ 

PCX.) 


SCxJ" 

3 


i = 1 O 

^ X ^ ill w ^ a « 


. Cl. 13) 


Corresponding to Cl»5) Ss C1.6).j the survivor function and 
the probability function are given by 


set) = p <1 - h.) <1.14 ) 

j I k .<t ^ 

j-i’ 

and p(K,) = h. n <i “ h- > ...... <1.15) 

J J ' ; '• 


1.2.3 ESTIMATION OF THE SURVIVOR FUNCTION s 

Let F Cx) = be the sample cumulative distribution 
n 

, . . number of sample values < x 

function = — 

n 

A plot of F <k) versus x visually represent the sample and 

n 


6 


provides full inforfflation on the percentile points, the 
dispersion and the general features of the sample 
distribut ion, it is an indispensable aid in studying the 
distribution shape of the papulation from which the sample 
arose, in fact the sample distribution function can serve 
as a basic tool in constructing formal tests of gaadness of 
fit of the data to hypothesised probability models* 

In the analysis of survival data it is very often useful to 
summarise' the survival enperience of particular groups of 
patients in terms of the sample c»d»f« or more usually in 
terms of the sample survivor function* If an uncensored 
sample of distinct failure times is observed from a 
hoiiiogenecus population, the sample survivor function is a 
step function decreasing by vT^ immediately following each 
observed failure time* 


Let t < t < t represent the observed failure 

1 £ k 

a sample of size from a homogeneous population 
survivor function S(t), suppose that d^items 
t J j = i ,2, « *l<) and in items are censored ■ in the 

Ct.»t, ) at times t. where 

} 

and t, = CO, Let n = (m. + d ) + ..... + (m, + d > 
number of items at risk at a time just prior to 
probability of failure at is 

set.) - set. + 0) 


t ime in 
with the 
fail at 
interval 

t = 0 

0 

be the 
t,. The 

1 


J J 


7 


(■sill S F" S 


S C t 0 ) = Lim S C t .-i- k > : 

4 


C j - 1 » 2 , 


;k) 


x-^o 

assume that the contribution to the likelihood of a 
survival time censored at t. is 


F CT > t., ) 


set, + O) 


In effect^ we are assuming that the observed censoring time 


t conditions under that the unobserved failure time 


IS 


greater than t 


Jl 


If censoring time were fij^ed in advance for each time thus 
we obtain 


L = n 

j 


k fr -jd. 

1 - s(YO)j ' n 


S(t., + 0) 
J*- 


1 


which is a likelihood function on the space of survivor 
function SCt) for a given data. The maximum likelihood 
estimate is the survivor function S(t) that maximizes L* 
This definition of the maximum likelihood estimate is a 
generalization of the usual concept used in parametric 
models. There are dangers associated with maximising 
likelihood of many para since such techniques may lead to 
inefficient or inconsistent estimates. The results of such 
maxifflization require some investigation to assure they are 
reasonable. 

Clearly SCt) is discontinues at the observed failure time 
since otherwise, I = further subject to t ,, > t. S(t,+ 0) 

jl J J 

is maximized by taking 


S(t.+ 0) = S(t.+ 0>5 

jl J 


Cj 


1 , 2 ,. 




k! 1 = 1,2, 


,m. 5 


8 


and set) = 1, (1 

oL ’ 


, >..,m > the function S<t) is then a 
o 


discrete survivor function with Hazard components 
at Ct^...tj^) respectively. Thus, 


set.) = n <1 " 

' ill ^ 

3 


sct.+O) = n<i-K> 

3 ,11 ’• 


. (1-16) 
ei.i7) 


r 

i 

' 3- i . d' 

nci - h^) ^ 

j "’i' 

n <1 - h ! ' 



^1 = 1 

^ l==i 



1 = 1 

where the h are chosen to maximize the function 


L = n 

j = i 


k d . n .-d . 

• n h / (1 - h > ' ' 

j = i ^ 

obtained by substitution of Cl«i6) and Ci«i7) in 

d. 

^ - ■>-; (j = l,2,....,k) 


ei .18) 


) 


n . 


and the product limit estimate of the survivor function is 

n -d . 


“'=,n M 


(1.19) 


i t . <i 
3 


The estimate F(t> is the direct generalisation of the sample 
survivor function for censored data. It was derived by 
Kaplan Heier (1958) and is known as Kaplan Heier Estimate. 

A 

The induced expression for the asymptotic variance of S(t) 
is then 


Var (S(t)) = S^(t) 


a . 

V .<t J 3 


J t ;<t 3 

) 


•d.) 

3 


. . . . (1.20) 


The expression (1.20) is known as Breenwood's formula 
(Sreenwood (1926)) was first derived as the asymptotic 


9 


variance of the classical life table estimator. 

13 : PARAMETRIC FAILURE TIME MODELS : 

The fiiain interest of the section to consider 
ralat ionship between failure time and explanatory variables. 
Therefore, ws should consider failure time distribution for 
homogeneous population* l^eibuli and Exponential 

distribut ions are more often used parametric models on 
failure time data*, These , distribut ions admit closed form 
expressions for tail area probabilities* Log*^narmal and 
garems distributions are still frequently applied to failure 
time data but less convenient computationally. 

Let T > 0 is a random variable representing failure time and 
t represents a typical point in its range* We use 

Y == log T 

to represent the iog'-f ai lure time shape comparisons among 
the parametric models are often simpler in terms of Y than 
T. 

1.3.1 THE EXPONENTIAL DISTRIBUTION s 

The Exponent ial ■ Distribut ion with one parameter 
is obtained if we consider the hazard function to be a 

canstant . 

hCt) - 0 I ^ > 0- over the range of T. 

The instantaneous failure rate- is independent ' of ' t» As we 
, , ^ know that Exponential distribution has memoryless property. 


10 


Therefore, the conditional chance of failure in a tisue 
interval of specified length is the same regardless of how 
long the individual has been on trial. The survivor function 
of T is given fay 

S(t) = e"I = e"J ^ = e~®* 

o o 

and its probability density function is 


/Ct5©) = 

-dS(t) 

dt 

_ -©t . 

©e 5 t 

> 0 

......(1.21) 

The p.d.f. of Y = 

log^T is 

obtained 

by 

means of ttie 


transformation 


y = log^t 
»» = t 


dt 


Therefore ,g^ Cy) = /(t) 


dy 

dt 


= e^ 


dy 




y— ot-e 


-a 


<y-<X> 


Cl. 22) 


where a. - -log © => © = e 

Let us make the transformation Y = a + W. The p.d.f. of W is 
given by 


g^Cw) = e*^ ® I -00 < w <oo ......(1.23) 

which is an extreme value distribution. The exponential 
distribution arises, also as the limiting form of the 
distribution of minimum of samples from some densities with 
range on (0,a) for some a<oo. This sometimes can be taken as 
theoretical justification for its use in survival studies in 


11 


iiihich a coniplsK mechanisin fails when any one of its many 
compon er« t s fails, 

1.3.2 THE WEIBULL DISTRIBUTION : 

The Weibull Distribution with two parameters 0^ 
and is a generalisation of the exponential distribut ioo 
with hazard function is given by 


XCt) =06 <a t )^2 ^ for 6 .0 > O 

±21 1^2 

XCt) is a monotone decreasing for .and increasing for 

9 >1 and constant exponential hazard if ^ = i 

2 2 

The p.d.f. of T is given as 


/Ct) = aa ca t> 2 e ■ ±^ 0<t<oo 

1 2 1 

and the survivor function is 

, -<e t)®2 

B<t ) = e i 


(1.24) 


(1.25) 


1 og ( — 1 og 3(1:)) = 0^ ( 1 og 1: + 1 og 0 ^ ) 

The plot of log (—log S(t)) versus log t gives empirical 
check for the Weibull Distribution where S(t) is a sample 
estimate of survivor function. The plot should give 
approximately a straight line, the slope of which provides a 
rough estimate of 0^ and the log t intercept an estimate of 


log 0^. 

Therefore, the p.d.f. of Y = log^T is 

( y-g ^(y-a)/g1 ......(1.26) 

9 e ^ J ; -00 < y < 00 

y V 

where & = -4- and a = — log 0 
& 1 


12 


Hore simply, we can write 

Y = a + <y\4 

where l*i has ewtreme value p.d.f. 

The shape of the density for Y is fi^ed because 6^ 
and affect only the location and the scaling of the 
distribution. The i*leibull Distribution can also be developed 
as the limiting distribution of minimum of a sample from a 
continuous distribution with range on C0,u3 for some 
u (0<u<oo) . 

1.3.3 THE LOG-NORMAL DISTRIBUTION : 

The density function of a log-normal variate T is 

given by 


/(t) 


^ (log <©t))* 

6 2 ^ i 

-4— e 2 — 


<1.27> 


/2n 


If me make the transforcnation Y = log T 

y 

inBm y = iog^t e’^ = t 
dt y 

r: 

dy 

Then the p*d«f» of Y is given as 


g(y) = 


9 <y> = 


1 


y+a^ (log + y)* 


]42 


1 

2 


^ a 


(1.28) 


2.n a 

Further, if we put Y - ot + cyW, where N is a standard normal 
variate with density function 


13 




e 2 


.... Ci. 29 ) 


/2^r 

The survivor function is given by 

00 

set) = J / iu> du 


00 


r 1 ■ 

= J u. e 


I og t 


0 ^Clog ©^u)^ 


du 




■n 


put log u = y 

— i- du = dy 
u 




in 


iji ot 


CO & 


Cy + loQ & )" 


J 

l oa t 


dy 


00 


i Cy-g) 


, 2 


J ® 

I og t 


<y dy 

where ot = -log © 


oy = © 


/ 


dJl 


00 


a 


m 


dy - a dw 


00 2 , 

-m /z . 

J e dw 


log i"^g 

g 


J 


&■ i og & ■ t 

, 2 . - ■ ■■ ■ ■ 2 , ■■ . i 2 . 

1 dw =. 1 ^ 




6 ' (log ■ d' % ) 
2 1 


J 

-00 


dw 




= 1 - <^(©^l 0 Q 0 ^t) 

w 


. . Cl- 30 > 


where ^<w) = J ^<u) du 


•CO 
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The hazard function is h(t) 


fit) 


Sit) 

The hazard function has value 0 at t=0 increases to maximum 
and then decreases, approaching zero as t becomes large. 

The log-normal model is particularly simple to apply if 
there is no censoring but with censoring the computations 
become difficult. 

1.3.4 THE GAMMA DISTRIBUTION : 

The Gamma Distribution is a two parameter 
generalisation of exponential distribution with density 
function 




& © — 1 — © t 

© ^(t) ^ e ‘ 

i 

rc© ) 

2 


■s © ,© >0,0<t<co 

’ l’ 2 ' 


(1.31) 


If we make the transformation Y = log T 


1 . e . 


log t . 


=*• t = e-’ 


dt 

dy 


e" 


Then the p.d.f. of Y = log T is given by 

1 


9 


^ 2 , y , 2 
© (e-^ ) 

i 


where 


a. 


ne ) 

2 

(© y~© e^) 

B Z B Z i 
1 

ne ) 

<© (y-oi)e^ 

Tie ) 

2 

- loQ e e = e 

^1 i 


-a 


Farther if Y * a hh the density function of W is 

V, 


Q C w ) 


iB m’-m > 
e 2 


ne ) 

2 


Cl*32) 


The error quantity W has a negatively skewed distribution 


15 


i4ith skewness decreasing^ with increasing 0^* At the 
exponential model survivor functiori of Saimiiia 

Distribution is 


OD 

set) = PET > t3 = J f(u) du 

I 

i 

=2 i - J f(u) du 
o 


6 z 

t 


no ) 

2 o 


J 


u 6? ■* 1 * . 

e 1 u 2 at 


-.6 

B 2 
± 


t © — 1 

= 1 - ■. -—g -. -- - r * e v^z dv Put & u == V 

2 0 0 du dv 

1 


= i -1 (6» t) 

C7 at 
2 


/ 1 

jw n w w «( % X » f 


where I^CS) is the incomplete Gamma Integral 
B 


r i<r- 1 f ® “1 ^ 

Ig<S. = J « 2 e dK 

2 2 0 


. <1.34) 


and the hazard 


h (t) 


function 


/<t) 

S<t) 


h(t) is given by 

e (a t)® 2 ~^ e”®i* Erca >3 
11 2 

i-i^ <e t> 

9 1 


. (1.35) 


2 

The hazard function is monotone increasing from 0 if 
monotone decreasing from oo if either case 

approaches 9^ as t becomes large. 

If 9^1, the Gamma Distribution reduces to the exponential 

distribution, with integer©^, the gamma distribution is 

also known as a special Erlangian Distribution. The Gamma 

distribution with integer 9 can also be derived as the 

2 


16 


waiting time to the 6^th emission from a Poisson source with 
intensity parameter 9 . The sum of 9 independent 

eKponential variates with failure rat© has also the Samina 
Distribution with parameters and 
1«3«5 LOS-LOGISTIC DISTRIBUTION 5 


Consider the model Y - log T = a + 

-I 

where a = - log 

Ue can construct different failure time models by selecting 
different distribution for the error variable W» One such is 
the log-logistic distribution for T if W has the logistic 
density 


w 


g(W5 = 


. ^ ^ W . 2 

C 1+e ) 


(i«36) 


This is a symmetric density with mean and variance given by 

2 


£(W) 


0 


n 


V(W) = 


The p«d.f« of T is then 

/(t;6» .9 5=0© (© + t)^z3~^5 0<t<<» ...... <1.37> 

i 2i21 1 

where B s= ^ and B = o* ^ ' 

1 2 

The survivor and hazard functions are given by 


set) — = s— 38) 

l+(8 t) 2 

1 

9 9 W 

h<t> = U-J i — — _ Cl. 39) 

l+(© t ) a 
■1 

The distribution is more useful for handling censored data 
than the log-normal distribut ion while providing a good 
approximation to it except in the extreme tails- The hazard 


17 


function is identical to the Weibuli hazard aside from the 

S 

dsnoiTsinator factor l+(0^t) z. It is monotone decreasing from 
00 if © <1 and is monotone increasing from 0 if 0 =1. If 

2 i 2 

0 >1, the hazard function resembles the log-normal hazard in 

( 0-15 2 

2: 

that it increases from zero to a fnaKiiBUni at t = g; — 

and decreases toward zero thereafter* 
i.3*6 BENERALISED SAMMA DISTRIBUTION ; 

The p«d,f« of three parameter generalised 


distribution is given by 


/ C 1 5 i| I ) 


^ ^ JL \ ^ ^ ““I 

B & (B t) 2 3 

12 1 

r(B ) 


-W t5^2 

B 1 

' ^ ; 


t>0 


(1.40) 


a 

— ot — 1 

where 0 = e and 0 = O' 

i 2 

This model was introduced by Stacy (1962). 

Special Cases z 

(i) When 0^=1=0^=» we get the exponential distribution. 

(ii) When 0^ = 1 , we get the gamma distribution. 

(iii) When 0^ = 1 , we get the Weibuli distribution. 
The log-normal is also the limiting case as 0^ — ►oo. 

1.3.7 REGRESSION MODELS s 


So far we have considered several survival 
distributions for modelling the survival experience of a 
homogeneous population. However, there are explanatory 
variables upon which failure time may depend. Therefore, it 
is of interest to consider generalisations of these models 
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tc take into account of concosnitant inforjuation on the 
individuals sampled. 

Consider a failure time T > O and suppose a vector 
. . . ,x^) of explanatory variables or covariates has 
been observed. ^ may include both quantitative and 
qualitative variables. The principal problem is that of 
model ling and determining the relationship between t and 
1.3. 7.1 Exponential Regression Models s 

The exponential distribution can be generalised 
to obtain a regression model by allowing the failure rate 
to be a function of the covariates ' 4 ., The hazards at time 
t for an individual with covariates ;s is 

X<t;as) = Ki'4y 

Thus the hazard for a given is a constant but the 

failure rats depends on ;g. Suppose the effect of the 
components of 4 is linear. Then 

X (t = Xc (ii 

where ^ is a vector of regression 

parameters, X is a constant and c is a specified 
functional form. The choice of c may depend on the 
particular data being considered. Three specific forms 
have been used s 

(1) c. ~ 1 + X 

(2) cCx) = Cl+x)“^ 




Tha -fii'-'st two of these correspond to Cl) the failure rate^ 
C 2 ) the iTiean survival time being linear function of 
They both suffer from the disadvantage that the set of Q 
values considered must be restricted to guarantee c C4 ^,1 > 
0 for all pcss i b Is as «. 

In many ways C 3 ) is the most natural form since it takes 

i'* 

only positive values^ We use the form cCk) = s ' here* 
Consider then the model with hazard function 

1.41) 

The conditional density function of T gives 54 is then 

In other words^ the model specifies that the log 

failure rate is linear function of covariates aS» terms 

of the log survival time V = log T^ the model can 

be written as 

where a = -log X and M has extreme value 
distribution^ 

l«3i.7*2 Weibull Regression Hodel : 

In this models the conditional hazard function is 

1 %i»A 

(1.42) 

The conditional density of T is given as 

/(t;;^) = Xp <Xt)^ ^ e 1.43) 

The effect of the covariates is again to act 


X(t;%) = Xp (Xt)^ ^ e*^^ 
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raal tipi icatively an the Weibuil hazard. If Y = log T, the 
ftiodel <1.43) is the linear model 

Y = <a + '4^* + otii i . 44 ) 

where a. = -log ^ = -o^. 

14 : SOFTWARE RELIABILITY : 


As Software form an important part of many 
critical missions such as space shuttles and important 
systems such as nuclear reactors and heart monitors^ the 
reliable operation of these pro J ec ts/systsms depends 
critically on the reliable operation of their software^ the 
concept of Software Reliability has gained considerable 
impart snce « 

The Life Cycle CLC) of software involves a series of 
production activities and can generally be divided into four 
phases Design^ Coding^ Testing and Operat ian/llaintenance * In 
spite of great advancement of the programming technology^ 
the chances of error/fault occurrence due to human 
iffiperf ect ion at every step are many. In other words software 
can never be made errors/f aui t/bug free« A fault/error’ is 
the defect in the software that^ when eKecuted under 
particular condi t ions ^ causes a failure^ where failure means 
that the program, in its functioning has not met users 
rec;|iiireineo ts is some .way* To remove these f aul ts/errors , the 
software is tested under a large number of representative 
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test cases i-iith ' the intention of exposinQ f aui ts/errors 


possibly contained in the program* 

yhen a faiiur*s is observed^ the code is in software to find 
the fault/srror which caused the failure and an attempt is 
mads to fix it« As a result reliability of the software is 
expected to increase* Thus it is very importarst to know the 
riumber of errors/f a.ul ts remaining in the software or time 
interval between software failures* One of the approaches to 
software ::^el iabi 1 i ty is to describe a Software Error 
Detection Process which represents the behaviour of 
errors/f aui ts during tasting phase of the software 
dsve lopfiient * Many models have been developed which attempt 
to estimate the errors content of a software and to predict 
the software reliability* These models are called Software 
Reliability Srowth Model (SRBM) • Since due to testing of a 
software-^ its reliability is expected to grow* 

Here um first review some of the models based on 
sv^Qtj “-Homogeneous Faisson Process (NHPP), Models discussed 
here are either based on failure intensity function or the 
tiieari value function* The two are not necessarily 
interchangeable* In Chapter Two and Chapter Three of the 
thesis me developed two discrete sof tware . rel iabi 1 i ty growth 
models vie* 

Ci) A Discrete Software Reliability Growth Model with 


leading and dependent errors* 

Cii) A Discrete Imperfect Debugging Software Reliability 
S row til Hodei* 

Hera me give a brief review of the release policies and 
different models* 

1*4.1 ASSUMPTIOWS s 

Some of the general assumptions assumed in every 
model are as fellows s 

1* Software System is subject to failure during execution 
caused by errors/f aul ts remaining in the system* 

2* Failtire rate of the software is equally affected by 
errors/ f aul ts remaining in the software* 

3p The number of failures detected at any time is 
proportional to the remaining number of errors/faults 
i n til e software* 

4» On a failure detected at any time is proportional - 
5» fill f aul ts/errors are mutually independent from failure 
detection point of view* 

6* The proportionality of failure detect ion/f aul t 

isolat ion/f aul t removal is constant* 

7* Corresponding to the error detection/removal phenomenon 
at the maiiuf actiirer/user end^ , there exists an 
equivalent error . detect ion/removal phenomenon at the 
user/manuf acturer end* 
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S. Softt^ars Life Cycle is more than the optimum release 
time « 


T » in e s r T'o r 'd e tect ion/removal phenomenon ‘ is modelled by 


NHPP « 

1«4«2 MQTATIOMS s 

ENCtii t>Oj = Counting process representing the cuniiiiati ve^ 
number of f ailures/isolat ion/removals in 
C 0 , t ) » 

a H S—expected initial errors/fault context., 

fa = Error detection/isolatian/removal rate per 

errar« 


fa. 


m (t) 
f 

m. Ct) 

I 

m (t) 

r 

fa Ct) 


fa (fli, ) 
f 

XCt) 
u? ( t ) 

wet) 


= Initial value of b, 

= Final value of fa. 

= S— expected number of failures in CO,t). 

= S-expected number of isolations in <0,t). 

= S-expected number of removals in CO,t). 

= Error-detection rate per error as a function 
of t. 

= Failure detection rate per error as a 
function of m^Ct). 
s Failure intensity at t. 

= Current testing effort expenditure at time t 
s Cumulative testing effort by time, t. 

t 

s J w(x) dx. 
o 
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RCk 1 t) 

X 

o 

T 

la 


3 

CCT) 


A- , 


R 


d 


= are paraineters af testing effort function. 

= Reliability* of a Software in Ct^t-i-x)* 

= Initial failure intensity, 
s Life Cycle of the Software. 

s Cost of removing a fault before releasing. 

= Cost of removing an error after releasing. 

H Testing Cost per Time. 

= Total Cost Function. 

s Desired level of failure intensity to be 
achieved « 

s Desired level of reliability to be achieved* 


SOHE IMPORTANT DEFINITIONS 

1.4.3 N0N-H0M06ENE0US POISSON PROCESS (NHPP) s 

Let CNCt); t>OJ be a counting process 
representing the cumulative number of failures by time t. 
N(t) is a random variable and the process CNCt)^ t>OJ is 
NHPP if 
Ci) N(0) = 0 

Cii> CNCt)? t>01 has independent increments. 

(iii) P Htwo or more events in <t.t+h>3 =0(h)« 

r ■ , 

Civ) P^Cexactly one event in - Ct , tHHh) 3' » X C t )h*+‘0 Ch ) * Where 

X{t) is the intensity . function of NCt). 

i 

and if we let mCt> = J XCx) dx represent m^Ct) or ■ or 


25 


m^(t) depending upon the model, then it can be shown that 


r / A. -m<t> 

P ClMCtl ==n1 = ri=ci.i.2» 

r n ! ? ^ f 

i«e* NCt) has poisson distribution with 
ECMCt)j«ffiCt) for" t > 0 and mit) is called 
function of the NHPP* 

1.4.4 SOFTWARE RELIABILITY GROWTH HODEL 5 


expected 
the me an 


Cl .45) 
value 
value 


Software Reliability Growth Model is defined as a 
iRathemat ical relation between the time span of testing Cor 
using) the Software and the cumulative number of 


errors/f aul ts detected « 

1.4.5 SOFTWARE RELIABILITY s 

Software Reliability is defined as the 

probability that a software failure does not occur in 
given that the most recent failure occurred at time 

t>0^ x>0« It can be shown that for a model based on WHPP 

„ , -'Eor Ct+x Ct ) 1 . , . 

RCxjt) = 8 r f ...... Cl. 46) 

1.4.6 FAILURE INTENSITY s 


The Failure Intensity function is the state of 
change of the mean value function representing the average 
CLimulativs number of failures associated with the given time 

point. 

i , e . X ( t ) = -4x~ C m, ( t ) 3 (1.47) 

at ■ f 

It may also be defined as the number of failures per unit 

time. 
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i«4*7 MODELS BASED ON FAILURE INTENSITY 


Starting tha failure intensity function^ 

Musa proposed ti^o models (Second Okumoto) throiigli isihicli 

it is attempted to predict future failure behaviour and 
reliability of software « 

1*4. 7*1 Basic Execution Time Model CMusal : 

It has a failure intensity function mhich decays 
exponential ly with execution time t 


i « e „ X C t ) = X e 

o 

where b is the rate of decrease per time 
as error detection rate per error). The 
intensity is given as 
X = afa 


****** (1*48) 
(which is same 
initial failure 

****** Cl *49) 


o 

The expression for the expected number of failures by 
tiii)s t can be obtained from (1*48) as 

ft? (t> = aCa-s"^^ ..Ci.50> 

f 

and the failure intensity can be expressed as a function 

of Ct ) as 
f 


XCm^) = b Ca—m^Ct)! *««**»C1»51) 
f f 

(1*51) shows that the rate of change of failure 
intensity with respect to failures experienced is 
constant whether it is the first failure at the last that 
is being fi xed « 
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i«4.7«2 


Logari thmic Poisson Model CHusa and Qkumoto) s 


This rfOdel has a failure intensity function which 

decays exponentially with respect to the mean value 

function m^Ct) 
f 

i«e« XCt)=Xe C 1 «52> 

o 

where b is rate of decrease per failure* From equation 
CiM52)^ we get 

mj.t} loo Ci+X bt)/b **.*«*C1*53> 

The **LoQari thmic Poisson liodel** is derived from the form 
Cl«53)» The failure intensity as a function of m^Ct) is 
given as 

X.(<n) = X E;e“*^3 mCt) (1.54) 

f o f 

This show that the rate of change of intensity with 
respect to failures BKperienced decreases eKponential ly 
with failures eKperienced and is not constant as in the 
case of basic execution time model. It means that the 
first failure initiates a repair process that yields a 
substantial decrease in failure intensity, while later 
failure results in much smaller decrements. This is 
because during testing, the software is first tested on 
frequently used inputs. 

Moreover it may be noted that by time infinity, the 
failure intensity reduces to zero and the number of 
lures experienced is infinity. This is possible when 
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being 


either at the time of debugging faults are 
introduced or the debugging process is imperfect or each 
fault generates more than one failure or when combination 
of these are possible* It is obvious in this case that 
the parameters of interest are not the number of errors 
in a software^ but the failure intensity and the rata at 
iiih i c h 'f 3 i 1 11 res ar e oc c u r r i n g « 

1«4»8 EWmEmmi models based oh mean value function s 

This category of SRGMs is based on slightly 

cl .X f e ran t a/ss umpfeion models are formulated based on an 
ej?pression for the mean value function of the Poisson 
Process rather than the failure intensity function and are 
suitable for finite error content. 

1.4. 8.1 Goel-Okomoto Model (Goel & Okomoto (1979)) : 

Assuming that the S-expected number of failures 
in Ct,t+At) is essentially proportional to the S— expected 
number of undetected errors at time t, the following 
equations can be easily written 

m,(t+At) - m,(t) = fa(a-m,(t)) At+0(At> ......<1.55) 

f f f 

where 0(At)/At > 0 as At > 0. By letting At - — ► 0 

in (1.55) gives 

m,<t) = bCa-m,{t)3 ...... (1.56) 

f f 

(1,56) together with mj(0) = O gives 

m(t) = ad-e”*^*) (1.57) 

f 
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which is mathematically isomorphic to the mean value 
function of Basic Execution Time Model. Hers it is 
assumed that the fault causing failure is immediately 
removed and hence fault detection and failure occurrence 
are assumed to be synonymous. 

1.4. S. 2 Exponential Model with Imperfect Debugging (Kapur and 
Garg (1990)) : 

During testing phase of a software on failure an 
attempt is made to correct the cause of the failure. 
However, it is not always possible to find the cause of 
the failure and remove it. This may be attributed to lack 
of sufficient knowledge about the software, poor 
documentation of the software and so on. 

In this model it is assumed that on failure instantaneous 
repair effort starts and the following may occur s 
Ci) fault content is reduced by one with probability p^. 
(ii) fault content is unchanged, with probability 1 ~Pq- 
Based on these assumptions, following different equations 
may be written as 

m;<t) = bp Cl-m (t>3 (1.58) 

f *^0 r 

This gives 

m (t) * a (1.59) 

r 

and 

m(t) = — C (1.60) 

f p 
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Cl. 60) implies that the expected number of failures by 
infinite time will be greater that 'a' which is because 
that some faults may not be removed even though an 
attempt was made to do it. 

1 , 4 . 8 . 3 Exponential Model with Testing Effort 
CYamada et al <1986)) : 

In general some resources like, manpower, C.P.U. 
time etc. are spent during testing phase of the software 
development. The consumption curve of testing resources 
over the testing period can be thought of as a testing 
effort curve . . 

In this model it is assumed that the S— expected number of 

errors detected in the time interval (t,t+At) to the 

current testing effort expenditure is proportional to the 

S-expected number of remaining errors. So, 
m' (t) 

■ .in = fala-m Ct)3 ......(1.61) 

wit) f 


This gives 

m Ct) = ...... Cl. 62) 

f 

It is generally assumed that testing effort in the 
software development process follows exponential or 
Reyleigh Curve i.e. 

wet) = a Ci~e“^^3 ..... .C1.63> 

or wet) = ot Cl-e ..Cl -64) 

Values of a and ft are estimated respectively from 
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Ci«63)w These estimated values' of ot apci 0 are then put 
in {1»64) to get the estimates of a and b« 

Since resources are finite, ae number of faults will 
remain undetected by time infinity* 
i*4*9 S~SHAPED SOFTWARE RELIABILITY GROWTH PHENOMENON 5 

It is generally accepted that there exists 
S-shsped software reliability growth phenomenon that is 
observed in the testing phase of the development of the 
softwares-., It has been interpreted in different ways* Ohba 
(1984) interprets it as initial delay of fault isolation 
after the initial failure detection* Yamada C1984) 
interprets it as unski 1 ledriess of the testing team* Ohba 
(19S4) developed a SRGH for a situation where a fault in a 
software is dependant on the previous fault detected and the 
cumulative number of failures/faults detected curve of this 
model also shown S-shaped phenomenon* This .same S— shaped 
ph€Hiomenon is again observed when the testing effort curve 
of YamacJa at al (1986) follow Rayleigh type of distribution* 
Again in flexible model of Bittanti <1988) this phenomenon 
is observed when the error detection rats per error is 
increasing* 

1»4*9«1 * Delayed S-shaped 5RGM (Ohba (1984)? s 

Fault removal, in this model is assumed to be two. 
phase process consisting of ''failure detection and its 


0 
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eventually removal by isolation* it takes care of the 
time taken to isolate and remove a fault and so it is 
important that the data to be used here should be that of 
fault isolation* 

It is further assumed that the number of faults isolated 
any time is proportional to the current number of faults 
not isolated* Failure rate and isolation rate per error 


are assumed to be same and equal to b* Thus 

rnrCt) = bCa~m Ct>3 *.*.**Ci*65> 

f f 

m'Ct) = faEm Ct) - m Ct)l (1*66) 

r I r 

Solving' these we get, mean value function as 

— bt 

m Ct) = aCi~(i-^bt)e ^3 Cl. 67) 

r 


Thus this model is called S-shaped because the graph of 
cumulative number of fault removed with respect to time 
has a Jump at the initial portion of the graph this model 
can be further be extended depending upon the severity of 
the error in the software* 

1.4*9. 2 Inflection S-shaped SROHC Qhba <1984) ) a 

The mean value function of this model is given by 
“b t 


m Ct> = 
f 


i-e 


-bt 


where 4^ is called inflection parameter is equal to 


Cl. 68) 
l**r 


where r is the ratio of the number of detectable faults 
to the total number of faults. 

This model take care of situation where error/faults are 
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ffiutuaily dependent and is based on the assumpt ion that 
the error detection rate per error increases througfiout 
the test periods. 

i«4«9.3' Exponential with Rayleigh Type Testing Effort CYamada et 
al (1986)) : 

As discussed in (1*4.8*)^ its mean value function 
is given by 

< 1 . 69 ) 

And when the testing effort curve follows a Rayleigh 
curve given fay 

Wtt) = (1.70> 

Then the reliability growth phenonienon is S—shaped-i 
1.4.10 DISCRETE SOFTWARE RELIABILITY SROWTH HQDELS 8 

In this class of SRSHs the number of test runs or 
the nufnber of executed test cases is taken as the unit of 
error detection period. Random variable of the HHPP is 
defined as the number of errors detected/removed by n test 
runs. 

For this class of SRGIis assumptions are modified in the 
discrete .sense. Little work has been done in this class of 

SRBHs. 

1..4.10'.i Error Content Proportional Detection Rate Model CYammda 
et al C1985n s 

It is assumed that the expected number of failures 
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1*4.10.2 


between nth and Cn l>th executed test cases is 

proportional to the expected number of faults reniaining 


in it i.e* 

m^Cn-^ll “ m^in} = b Ea-m^Cn)! »«...«.Ci.71) 

f f f 

Solving the difference equation, we get 

m,(n> = ai:i-Ci-fa>’^3 ......C1.72) 

f 

and X(n) = ab(i-fa>'^ {1.73> 


Seometric Error Detection Rate SRSMCYamada et alC1965 > > z 
This SRSH is developed under the assumption 
that the ratio of the error detection rate for any test 
run and the rate for its predecessor is constant less 
than unity, and the expected number of failures per test 
is geometrically decreasing. So 

ni (n+l) - m (n> = Dr" (1.74> 

f £ 

where r is decreasing ratio for the expected number of 
error detected per test run and D is the initial 
expected number of errors detected by the first test 
run . Thus 

111 (n) = D 7 " {1.75> 

f i-r 

And the expected number of errors to be eventually 
detected i.e the expected initial error content is given 
by 

m^(co) » D/<l-r> (1.76) 
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1.4.10.3 


Discrete S-shaped SRSH <Kapur et al <199Q>]> 


This fflodel takes care of the delay between the 
failure caused by a fault and its subsequent detection. 
It is a two phase phenomenon and so the mean value 
function of the model is given by 

m (n) = aCl-(i+bn> (i-b)’^3 Cl. 77) 

r 

i«4.11 ESTIliATIOW OF PARAMETERS OF SRSM s 

Our main aim in softi^are reliability modelling is 
to estimate different model parameters so that the future 
behaviour of the softi»iare can be predicted suppose that the 
data on n failure times Cor isolation or removal times) 
Sis «s - ) iJtihere s < s < ..»< s are observed during 

testing. Then the likelihood function for unknosMn 
parameters of the model given S is given by 

L = g-m<Sn) ^ ^ j (1.78) 

i 

where 

CmCt)) t * g (1.79) 

i dt V 

where mCt) is the mean value function of the underlaying 
NHPP. From (1.78) maximum likelihood estimates of 
parameters can easily be obtained. 

Alternatively, suppose that the data on cumulative number 

of failures (or isolation or removals Y, (0> (O < y < y 

fc ■^z 

<...< y ) in a given time interval Eo,t, 3 are observed, 
n k 

Then the likelihood function of the unknown parameters is 


given by 


n 


Cm<t, ) - m<t, ) 3^k ^k-i r /i. ^ /i. •. t 

k k-i -CmCt, >-tin(t, )3 

^ e k k-i 

k k -1 


C1«S0) 


Froffi Cl .SO), estimates of unkno^in parameters can easily be 
obtained. 


1.4.12 PREDICTIVE VALIDITY OF MODELS (MUSA ET AL (1989)) 5 ■ 

It is ability of the model to determine futyre 
f ai iure/removal behaviour during either the test or the 
operational phase from present and past f ai lare/removal 
behaviour in the respective phase. It is very effective 
tool to compare the appl icab i 1 i ty of models on a particular 


data. 


Here we attempt to predict the number of f ai lures/removals 
that will be experienced by the end of the , period' of 
tasting over which the data has been collected and compared 
this wiith actual valves. Assume that n f ai lures/removals 
has been observed by the end of the times S ■ where ' S. is, 

Tt t 

the time to ith f ai lure/ramoval . 

Clearly, 0 < S- < S < .... < S ■ the 

’ i 2 n 

failure/removal data upto time S C<S ) is used to estimate 

t n 

the parameters of the mean valve function m,(t) or m <tJ. 

I t 

Then the number of failures removals by time S can be 
predicted by substituting the estimates of parameters in 
m^<t) which is compared with the actually observed number 
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n. This is repeated for various values of S^. 

The predictive validity can be checked visually by ploting 

normalised relative number of errors or removals 

Cm,CS )-n3/n or Cm iS )-n3/n aoainst the normalised time 
f n r n ^ 

S^/S^ , The error/removal will approach to zero as at 
approaches If the paints are positive (negative) the 

model tends to overestimate (underestimate). Number closer 
to zero implies more accurate prediction and hence the 
better model. 

1.4.13 OPTIMAL RELEASE POLICY s 

It is of utmost importance to find the 
appropriate release time of the software. If the release of 
the software is unduly delayed, manufacturer may, suffer in 
terms of penalties and revenue loss, while a premature 
release may cost heavily in terms of the fixes to be done 
after release and may even harm manufactures reputation. 
Therefore, manufacturer must have some idea about the 
possible attributes of the softwares like its initial error 
contents failure rate, reliability at time t and its 
potential release time. Thus it is of interest to know when 
to stop testing and realize the software. Software release 
time problem has been classified in different ways. 

One is, when t release a software so that the cost incurred 
during the life cycle of the software ie during development 
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and operational phases is minimised or reliability 
reaches a desired level-. 

Softisiare cost includes the cost of removing errors both 
before and after release of the software and testing cost 
during testing phase* Assuming that the software is 
released at time T cost function in general can be written 
as 

C C T ) C IB C T ) C Cm C T ) ■*” m C T ) ) C T . » » » « C i « 8 1 ) 

i r . 2 r k r 3 

Thus the problem of finding optimal release time reduced 

min CCT) 

subject to 

R(k IT)' > or - XCT) < Xd ■ - (1.82) 

' T > 0 

Far the discrete models, T is replaced by n, the number of 
executed test cases* 

Alternatively, this problem can be redefined in terms of 
maximizing gain, which is defined as the difference in cost 
incurred when all the errors are removed during ' operational, 
phase as against the cost, when some errors are ' removed 
during the testing phase and others are removed during , the 
operat iofi.al phase* thus denoting it by GCT), we have 

■ GCT)-'..^ CC *-C ) .m (T) - C ■ .**.**( 1*83) 

2 . i . r '3 

It can be seen that maximizing gain is. same' as minimizing 

cost * 
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y© have so far assumed that the software life cycle is 
constant* However^ in real situation it might be more 
appropriate to assume it to be random variable because 
software system may be abandon ed as new versions are 
available^ Recently Yun and Bai I 1 discussed a software 
release policy imaK'iiiiizing profit alone ^ when the price of 
the software and the ©Kpenditure incurred on testing^ 
correcting and error during testing and operation* One may ■ 
aisc incorporate the idea of penalty cost which in incurred 
by the manufacturer by not delivering the software fay 
sch edu 1 ed d e 1 i v e r i ng t i me * 

1.5 ; THE PROPORTIONAL HAZARDS MODEL : 

Let XCt;x) represent the hazard function at time 
t for an individual with covariates x* 

The proportional hazards model due to Cox (1972) specifies 
that 

^ X^it) e .*.**. (1.5*1) 

where X^Ct) is an arbitrary unspecified base line hazard 
function for continuous T and is a row vector a k measured 
variates is a column vector of h regression parameters^ T 
is the associated fai lure , time* 

In this model, the covariates act mult ipl icat ively on the 

hazard function* If X Ct) * X* the model (1*5*1) reduces to 

o 

the exponential regression model 
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= Xe 

,p-i 




. . Ci.5.2) 


If X (t) = XpCXt)^ then the model reduces to 

0 


XCt;}^) = Xp(Xt)P ^ s 


xg 


(1.5.3) 

Corresponding to model (1), the conditional density function 
of T given jg is 




f(t;jc> = e 




t 

f X Cu> du) 
o 
o 

Cl. 5. 4)' 




The conditional survivor function for T given ^ is 

t 


:(t;«) = S^(t)j 


' 4 ^ _ J 


(1.5.5) 


-/*■ X (u)du 

where S(t>-s^’^ 

o 

Thus the survivor function of t for a covariate value 4 is 
obtained by raising the base line survivor function 
a power. 

The two important generalisation that do not substantially 

complicate the estimation of First, the nuisance function 

X^(t) can be allowed to vary in specific subsets of the 

data. Suppose the data is divided into k. strata and that 

hazard X^ (t?^) in the jth stratum depends on an arbitrary 

function X (t) and can be written 

X (txjs) « X (t) e ...... <1 .5.6) 

J oj 

for j = 1,2, 

Such generalisation is useful when some explanatory variable 
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or variables do not appear to be multiplicative effect on 
the hazard function,. The range of such variables can then be 
divided into start with only the remaining regression 
variables contributed to the exponential factor in 
The second important generalisation allows the regression 
variables 14 to depend on time itself- Such . regression, 
variables arise in the heart transplant- yhere , treateient 
group itself is tinie dependent as are certain donor 
r ec i p i en t ffia tch ing variables- 
1-5-1 THE ACCELERATED FAILURE TIME MODEL : 


The model specified in (1-5*1) is the is the 
mult ipl icati ve effect of regression variables on the hazard 
function- This model does not postulate direct relationship 
between ^ and t« As we know that the exponential and Weiball 
regression models are linear in 

Y = log T 

where the conditional density function of given ^ is 

-f = Xe e 


log = log X + - Xte 

= a - x,& + W 

Here we obtain a second class of survival models, the class 
of log linear models for T. 

Suppose that Y = log T is related to the covariatss jfe via a 
linear model 
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X = + u 


(1.5.7) 


where U is an error variable with density f . 

EKponent iation gives 

-r - 

T = e = e = e e 

=T*e ...... (1.5.8) 

where T'=e'^’ > 0 has hazard function X. Ct) that is 

o 

independent of 

The hazard function for T can be written in terms' of base 
line hazard X Ct)« 

X ( ^ > = X,^ ( fee > e ......(1.5.9) 

The survivor function is 

-x8 

= e ^ '.X (uT du C 1 »5 « 10) 

o 

and the probability density function .is the product ' of 
equation Cl«5*9) and equation Ci«5«i0) as 
fCt|jp^}^XCt|ji^)SCt|j^) 

. r - **■) “■'^O >. (u! du 

'^r J ^ " o ...(1.5.11) 

This model specifies that the effect of the covariable is 
multiplicative of t rather than on the hazard function. That 
is, we assume a base line hazard function to exist and that 
the effect of the regression variables is to alter the rate 
at which an individual proceeds along the time axis. It is 
supposed that the role of is to accelerate the time to 
failure. The model (1.5.9) is known as Accelerated Failure 
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Time Model 


All the parafietric models lead to linear models for 
exporfential and Weibull regression models can be considered 
as special cases of either the Accelerated Failure Time 
Model or the Proportional Hazards Model* However ^ the log 
linear models derived from other parametric models are not 
special case of the proportional hazard, models* For exanipie^f 
log-riortrial hazard function with different location 

parasTieters and are generally not proportionai to one 
another* 


5*2 COMPARISON OF PROPORTIONAL HAZARD MODELS AND ACCELERATED 
FAILURE TIME MODELS s 


Let us consider the ' intersection of the 
proportional hazard models and the Accelerated Failure Time 
Models* Consider the subset of lag--l inear models in which 
the regression variable acts mult ipiicative on hazard 

function* Consider the proportional hazard model 

' 4 ^ 


X = X <t> e 

oi 


for ail 


and the accelerated failure time model 


X<t;js) 


02 




1 


for ail it Ic % 


1 * e , 


K ■' ft J e 

Oi 


3 , X 


02 


te ■ 


The value x = 0 gives X (.) » X (.> » X (.> 

^ Ol 02 O 


while log t/0^^,0, , - . . ,0) gives at that t 
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1 


e 

X (t5t “ = X (l)t * 

o o 

where 6 and & are the first camponent of & and © 

li 21 1 2 


f B ^ 


r a 1 

ii 


El 

& 

% = 

a 

i2 

E2 

: -a 


a 

1 j 


1 2k J 


It follows that for all t 

X (t) = XpCXt)’^”^ 

o 

where p = © 0 ^ 

^ il 2 1 


f \i/p 

|x^(l)/p| 


Note that 




The Weibull and Exponential log-linear aodels are then the 
only log-linear models fii Cl«5«l), This leads to a 
charac teri zat ion of the parameter- Weibull model as the 
uniqua family that is closed und-er both multiplication on 
failure time and mul t ipl icat ion of the hazard function by an 
arbitrary non— zero constant* 

1.6 : DISCRETE FAILURE TIME MODELS : 

The models so far discussed are appropriate for 

failure time data arising from continuous distributions. 

However, failure time data is discrete which arises either 

through the grouping of continuous data due to imprecise 

measurement or because the time itself is discrete. 

Let X has a Weibull distribution with survivor function 

(Xk)*" 


SCx) 


. ( 1 . 6 . 1 ) 
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and tiiTias are grouped into unit intervals so that the 


discrete observed variable is 

T EXH where CXI represents ^’integer part of X*h 

The probability function of T is given by 

pit) = PCT=t) = FEt < X < t-i-i3 

= FCX>t) “• PCX > t4-i) 

).p| t == 

C? ■*“ & 2B 386 Jtt as 


. Cl. 6. 2) 


iiihsrs 0 = e , 0 < 6 < 1. 

The special case p=l is the geometric distribution with 

t 

probability function O 

i « e « p C t ) = 8 (1-&) I t =■ 0 1 1 5 2 

The hazard function corresponding to (i»6,*2) 

XCt) = PUT = t j T > tl 

= 1 ^ .(1.6.3) 

Ci) k(t) is monotanic increasing for p > 1. 

(ii) X(t) is monotonic decreasing for p < 1. 

Ciii) X(t) constant for p=l . 

This can be generalised to a regression model by applying 
the same grouping to the Weibull Regression Model. 

1.6.1 DISCRETE PROPORTIONAL HAZARDS MODEL : 

Let the failure time T given covariates ;s have a 
discrete distribution with mass point at 0 £ x < x < ... 

i 2 

and so on. Let S^Ct) represent the base line survivor 
function for 54 = Q. The corresponding survivor function tor 
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covariates is 




S(t,;s5 = tS <t)3 

# ’V ^ 


(1.6.4) 


If the hazard function corresponding to has contribution 

k. at x, then 

I t 

s ct) == n ci -- ) 

O . } H X. 

i I X . < t 
* V 


anc 


'4^ 


sct,^) = nCi -X. ) 

. I * * t 

\ X . < t 
‘ h 


The hazard at k. for covariate x is then 


Ci .6«5) 


Cl'-X. 

t 




Discrete model (1«6«6) can also be obtained by grouping the 
continuous model* If the continuous failure time arising 
from the proportional ■ hazard model are grouped into 
disjoint intervals 

CO - a md )- Cot )««»»• Co -ot, = oo) 

1 ’ 2 ’ ’ k*“i’ k 

The hazard of failure in the interval for an individual with 
covariate 54 is^ 


pfTeCa. ,a.>iT > at 

L t * t 


«]■( 


l-Cl-X > 

01 . 


Ci*6*7) 


where 


X. 


-J 

ot. 

e X (u> du 
o 


This discrete model is then the uniquely appropriate one for 
grouped data from the continuous proportional hazard model. 


If the discrete base line hazard function is given fay 


X (t) dt = Y X 6(t-x J dt ...... <i .6.8) 

d Zrf i ^ 

4 ^e see that the hazard function for covariates is 


'4^ 

X(t:x) dt = 1 - Ci - X,(t> dt)® ...,..<1.6.9) 

d 

It X Ct) is replaced with a continuous hazard XCt) in 
d 

(1^6m9) the relationship is precisely 

4^ 

XCt:;;^) = X Ct)e 

o 


Thus if X. Ct5 is the base line hazard function x = 0 for a 
o ^ 

discrete or continuous or mixed random variables* The 

relationship between the survivor and hazard function is 

set 134.:) = Sp^ Ci XCuiJfc) du3 
o ■ 

= Cl - X (u) du3® 

^ o 

o 

where is the product integral is defined as 

fp*’ Cl - dA(u)3 = lim f| Cl - ■tA<Uj^)-A(Uj^_^ ) J 3 

O i. . - 

Here 0<u <u*<*«*<u =t and the limit is taken as r — — > m,' and 


LI 


LI 


‘k-i 


-» 0 , 


set) = P El - X(u) du3 
o 


1.7 : METHODS OF ESTIMATION OF PROPORTIONAL HAZARD MODELS : 

Here our attention is focused on different 
methods of estimation of data arising from the proportional 
hazards model. 

In parametric case the failure time distribution is assumed 


48 


known except for a few scalar parameters® The proport ional 
hazards models hopiever^ is non parametric in the sense that 
it involves -an unspecified function in the form of an 
srbitra^y base-line hazard function® In consequence^ this 
rr^odel is more, flexible but different approaches are required 
f Cl r e a t i a i i c r * 

T'he 'rain prcblsms addressed are those cf estimation, of' ^ and 
The different .methods of estimating the parameters ^ 

are 

1« liethod of flarginai Likelihood® 

2n Method of Partial Likelihood® 

3* Breslow's Maximum Likelihood Method* 
l®7«i METHOD OF MARSINAL LIKELIHOOD 5 

Suppose that n individuals are observed to fail 
at mth corr. spending cevariatap 

We assume that all failures are distinct. 

Let OCt) = (t ,t ,...,t ) be the order statistics and 

1 Z 11 

P <t ) = < C 1 > , (2) , . . . , (n ) ) be the rank statistics, the order 
statistics refer to the t 's ordered from smallest to 

ii> 

largest i.e. t < t <....< t and the notation (i), 

“ (1> iZ> <n> ’ 

in the rank statistics refers to the level attached. 

'■* 1 , ' 

Consider the model (1-5. IJ and define u = g (t) where g € G 
the group of strictly increasing and differentiable 
transformation of <0,co> onto <0,oo5 the conditional 
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distribution of u given x,has the hazard 


X(4,0[x) = K^iu)e''^ 


where 

X (u) = X g(uJ g^<u) 
i 0 




Thus, 

if the data were the presented 

in 

the 

form of 

u -u . 

...,u and X ,x ,-..,x where g (u. ) = 

n i. 2 r> V 

t. 

t 

in : 

inference 


problem about & be the same provided were completely 

unknown. In effect the estimation problem for & is invariant 
under the group 8 of transformation on the survival time t. 
For inference about 0, the marginal distribution of the 
rank's is available and the marginal likelihood is 
proportional to probability that the rank vector should be 
observed. That is the marginal likely is proportional to 
PCp 5 g) = PCp = E (1 > , (2) , . . . , (n>l ;^) 

00 00 00 n 

= 11 1 ,n dt ...dt 

O I I t = 1 <n> <i> 

<i> <n-l> 

n 

.z x,e 
e 


ii> 

Where RCt. ) is the set of levels attached to the 

<t> 

individuals at risk gust prior to t . « 

i«e» RCt ) - CCi), Ci+D- 4 — — Cn)) sojn© modification is 
rsqLiired for handling censured data* If all the items are 
simultaneously put on test and follow to the K, failure 

to 

time (type II censoring), a marginal likelihood is again 
easily obtained. 
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In this case group acts transitively on the censariog .tine 
and the invariants in the sample space is the first k rank 
y a r i ab 1 es E C i ) , C 2 5 5 * « m ^ C k ) 1 « 

The argument could be extended to progressive type II 
censoring patterns inhere items are mithdruwn from test with. 
9 a ch a 1 1 a 


i«7.2 riETHOD OF PARTIAL LIKELIHOOD s 

The method of Partial Likslihaod proposed by Cox 
(19755 for the proportional hazard models* 


Corssider tiie set RCt .05 individual at risk at 

<u 




- 0 , 


the 


conditional probability that item (i) fails at gi.ven 

that the items RCt. ) the exactly one failure occurs at 

(t> 


XCt 


i i > ^ ^ < i > 




s 

leRCt 






Cl. 7.1) 


it > 


l^RCt .) 


for i = 1 m2. 


Ths partial likelihood for B is noiai formed by taking the 
product over all failure points of Cl«7«i) 


LCd) 


JC 


V =;i 






^l«RC t . ) 
<1. > 


Cl. 7. 2) 


which is identical to the marginal likelihood given in (.). 
It is has been shown by cok that the method used to 
construct this likelihood gives max partial likelihood 
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estimates that are consistent and a symptotic normally 
distributed with asymptotic covenians matrix estimated 
consistently by the inverse of the matrix of second partial 
derivatives of the long likelihood function. 

Taking logarithms on both sides of eq (1.7.2> 



JC 

Log L(©) = 1 


It follows that the same asymptotic results for & hold for 
estimation from partial likelihood as for the usual 
likelihood function. If the ties are present in the data the 
partial likelihood can be obtained by applying similar 
arrangement to the discrete logistic model. For this model 
the hazard relationship is given by 


X ( t ; K 5 d t 


i - X (tj ;5) dt 


X , ( t) dt e 
d 

l-X^(t) dt 
<2 


. . (1.7.3) 


where is an unspecified discrete hazard giving 

positive contribut ions at the observed failure times 

C- st .....t, . 

<l> * <2> (k) 

A direct generalisation of the above argument can then be 
used to compute, at each failure time, the probability that 
the failures should be those given the risk set and the 
multiplicity d. . 

A simple computation gives the conditional probability as 
the tth term in the product. 




TT 

■i, S2 1 


e 



Ci«7. 4) 


where S. = is the suin of the covariates associated with the 


d failures at t . * 

i <L> 

=sl 

R Ct ) is the set of all subsets of d. items chosen from 

di ih> t 

the risk set RCt. ) without rsplacsment^ 

The partial likelihood (i»7»4) does not give rise to a 
consistent estimates of the parameter ^ in {l«7»i) if the 
ties arise by the grouping of continuous failure times* This 
inconsistency in the partial likelihood occurs since Ci*7»,4) 
must be thought of arising from the discrete model Cl»7*3) 
and so estimates the odds ratio parameter ^ in that model 
since Ci«7«3) does not arise as grouping of the continuous 
models the two parameters do not have identical 


i o t e rp r e t a t i on s 


1.7.3 BRESLOW'S MAXIMUM LIKELIHOOD METHOD s 

The hazard function is approxifiiated by a step 
function with discont inuities. at each observed failure time 


1 * e « 


where 


X C;fe) = X * , t . . < t < t . 


t =0 and t , ^ m 

<o> ik^t> 


...... (1*7,5) 

(i « 1^ 


53 


If an individual censored in the interval 
taken to have been censored at t . 

it— 

The likelihood on the data is L C t^^^ 5 




1 


TT "o 


ct . ) 

Ct,> 


di 


.Q f Cu>du 

1.^ — J o 


e o 


K.O 


lit . > 

<t > 


t = i 


. (1.7.6) 


where HCt > is the set of labels attached to the 

<i> 

individuals either failing or censored at Using 

(1.7.5), (1.7,6) reduces to 



as the likelihood of and ^ jointly 

information is contained in the data about X, for 

k+4 

fixed (1.7.7) can be maximised with respect to X^ . 

( i = i ,2, . . .k ) at : 



no 

any 
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i ikel ihood 


Substitution in (1.7.7) gives the <na:<imum 
function of ^ as proportional to ( . ) . 

The result (.) is obtained by this approach as the 
appropriate result when ties are present in the data where 
as the fT.arginai or partial likelihood approach, (.) is 
obtained as an approx imat ion to the exact result. 

This approach can be criticised on the grounds that the data 
are in effect determining the model which is specified given 
the failure times tot-t,...,t. 

U> ‘ <2> <k> 

It is also well known that maximizing a likelihood over a 
large number of muisance parameters can lead to misleading 
and biased results. Similar results can be obtained once the 


model { 1 

> is selected fay Baysian 

approach 

provided 

the 

d istribut ion 

for log Ci®l,2,.. 

»k) are 

taken 

to 

be 

independent : 

uniform prices on (- 00 , 00 ) 

independent ly 

of 

the 

proper prior 

p<$5. 






1.8 : BAYSIAN INFERENCE : 

We know that in parametric inference, the form of 
the population f{x,0) is known while the parameter 0 is 
unknown, however, we agree upon the parameter space i.e. the 
set of all possible values of the parameter which we denote 
by Q. For example, in case of exponential density function 
f(K|0> = ~ ; x>0, aij'O 

The parameter space is O = C 0[0 > OJ. 


55 


In. classical estimation theory the estimate of '0 depends 
only on the sample values which we draw from and as 
such the information about B provided fay the data only is 
taken into considerat ion « However, there may be situatiQ.iis 
in which we may wish to^ incorporate information about & ' from 
other scLirce as well. This additional information is called 
subjective judginent about the unknown parameter & and can be 
combined with sample data using Baye's Theorem if 
expressible in the farm of a probability distrifaLition « There 
cirs cases in which B can be regarded as a random variable 
with p.d.f For axampia, in the case of exponential 
model, the mean life & may be regarded as varying from batch 
to batch over time and this variation may be represented' by 
a probability distribution over 

Suppose that n items are placed on a test* It is assumed 
that their recorded lifetimes form a random sample say 
X«Xm*,***«*«X ' which follow a distribution ' with 
p*d«f*Cx 58 )* To be specific we will assume to be real 
valued- Consider 0 itself as a random variable with p«d-f« 
qi&}u Thus, the failure time p-.d-f- fCx;©) can be regraded 
as a conditional p«d»f« of x given 0 i-e« fCxjd)- Where the 
marginal p.d-f- of B is given by gC 6 ). 

Therefore, the joint p«d«f« of ' is expressed 
as 
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= ]~f ff«J 

i = ± ^ ^ 


gce) 


K’”* '‘"I ®] 


9 wy 


Ci.8.1) 


The marginal p.d.f. of (X^ , X^) is given by 

F rx,X,...5xl =P rx=x,X=K,...X=xl 

[ l’ 2’ ’ nj t 1 l’ 2 2’ n nj 

= f f(X jK I dB m 

D 


Ci.S«2) 


and the conditional p*d*f. of B given data CX' .X ) is 

^ i 2 n 


given by 


n[ 0 i = 


fCx «K 

1^2* ’ n 


a) 


— rTTTrx > 

t ’ 2 " ^ n 

J L <x^ ,x^ , « « . j ©) Q(0)d^ 

n 


<i.S*3) 


Thus prior to obtaining the data CX *X .•.•■X ) the 
variations in B were represented by gC^) ■ known as prior 

distribution of Bn however- after the data {X ,X *-.«X ) has 

been observed, in the light of new information the 
variations in X are represented fay ff | i , the 

posterior distribution of Bn The uncertainty about the 
parameter B prior to the experiment is represented by the 
prior p«d*f- qiB) and the same after the experiment is 
represented by the posterior p*d*f- I \ ® \ 

Cl. 8 - 3 )- 

In case the prior distribution of & is discrete the integral 
sign in (1.8-3) is replaced by summation over 0« This 
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approach of development of posterior distribution is knciJ^n 
as Baysian after River End Thomas Bayes an English Hinister 
who lived in the i8th century. The process is a straight 
forward application of Bayes Theorem. Once the posterior 
distribution has been obtained, it becomes the main object 
of study. 

Any statistical inference about B may be draw with help of 
this posterior distributions. 

Under the squared error loss function 

1 .8.4) 

5/^here &' is an estimate of 

The Bayes point estimate 0 of © is defined as the posterior 
expectation of 9 given the data X = CX ,X ,...X ) i.e. 




1 ' 2 


0 == El© 


[s I X ] = J 8 n[ e I X ) 


» « «r 


Ci.8.5> 


It can be observed that for 9 - 9, the loss function 

defined above will attain its minimum value. A100(l-ct)% 
Bayes confidence interval ® obtained from 


9 


J n [ ^ 1 21 ] de = i-a 

9 


Cl. 8.6) 


For testing hypothesis H s 0 e H Vs H : 9 « H where 


H 


and H are mutually exclusive sets of the parametric space 

'■ t. 

Hi m& can !iiake two decisions 0 and D ^ D means H is- ■ true 

* O i O O 

while D implies H is true* Now for wrong decision^ me have 
1 1 • 
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t-Q anticipate some positive loss defined by 

= Loss incurred liihen decision S is made and B is 

0 o 

true value of the parameter. 

r 

0 e <s H 

o 

— i 

B,iB} ^ H 

^ i 

K . 

LC0,D ) = Loss incurred when decision D is made and & is 

1 1 

true value of the parameter. 

f 

h(&} s H 

^ o 

- ^ 

O € H 

^ 1 

f'low in Bayes testing procedure we reject if average loss 
for decision is greater than average loss for decision 
D , i . e . 

if J a<e) f| (0 j^) de > J b fl C6> jK> d& ... Cl .8.7) 

H H 

1 o 

from equation C3>, we have 

oi i-C ^ ^ x^ j © ) g C0 ) 

which shows that for large samples the posterior is more 
dominated by the likelihood L Cx^,x^, . , . ,x^ j8) than the prior 
qi&y. Therefore as n tends to infinity or large, the Bayes 
estimates of d will tend to its classical estimates showing 
thereby that in large samples the choice of the prior 
distribution is not very crucial. 


5*9 


1.9 : 


1.9. 


A BAYSIAN ANALYSIS OF THE PROPORTIONAL HAZARDS MODEL : 

In "fcliis sec*feion gives s brief outline of a. non 
parametric Baysiao analysis of the survival time data 
arising from the proportion a hazards models 
i mm PARAMETRIC BAYSIAN PROCEDURES s 


Here consider specific application of some 
no-fi—parama tr 1C isaysian procedures to survival distributions. 
Suppose that the survivor function (conditional) of the 
random variable T is 

PC T > t j X 3 = 

whs-ch is conditions! on X C.5. Since X is the parsflieter in 

o o 

the model, is the realisation of a stochastic process to be 
defined. Consider a partition of C0,<») into a finite number 
k of disjoint intervals £0 = 0,0 ) £0 ,0 >,...,£0 ,0 =co) 

and defined the hazard contribution of ith interval as 


.. (1.9.1) 


0. = pT 

r e £0 ,0 ) 

1 T > 0. ,X 

*• L 

*1 

L-l i 

' 1-1 0 


if pj^T > 0._^[ xj > 0 

otherwise 0. = 1 (i = 1 , 2 ,..., 


k) 


Xo<e.) = j-lQ9 (1-0.) « j r. 

'■ j=i j=i 


(i .9.2) 


where r. = - loo e (1-0.) 

j j 

Doksum (1974) has considered this situation and has shown 
that a probability distribution can be specified on the 
space CX^(t)> by specifying the finite dimensional 


distributions of 0 ,0 


..,0j^ for each partition C0^^,0^), 
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CHAPTER-2 


A DISCRETE SOFTWARE RELIABILITY GROWTH MODEL WITH 
LEADING AND DEPENDENT ERRORS 

2.0 : INTRODUCTION : 

Software reliability fnodsliing is very important due 
to the fact that it is not possible to produce fault free 
software. The fault in the software occur due to human 
imperfection. These faults manifest themselves in terms of 
failure when the software is run. Testing phase in the 
software development process aims at detecting and removing 
these f aultsCerrors) and making the software more reliable. 
Thus it is very important to evaluate software reliability 
during testing phase, based on software error data analysis. 
Modes concerned with the relationship between cumulative 
number of error detected through software testing and time 
span of testing are called software reliability growth 
fflodals 'SRGMs). Based on non-homogeneous Poisson Process 
CNHPP), several SRGHs 1 1 , 4-£. , 8-9 3 have been developed to 
predict remaining errors in the software and to evaluate 
measures such as mean time between failures, software 
reliability etc. Moreover, each software system is developed 


for a differxant objective and so it is not possible to 
develop sn SRBM which can analyse failure data for ail 
software systems. 

Host of the SRSMs developed use calendar time or CPU time as 
the unit of software error detection/removai period. 
Howe VST 5 at times the number of test runs can be a more 
appropriate unit of software fault detection/removai period. 
Such as SRSM is called a discrete SRGM and relates the 
number of faults detected/removed to the number of test runs 
during the testing phase. A test run can be a single 
computer test run or a series of computer test runs ewecuted 
in an hour, day, week or even month. Very few discrete 
SRGtis, have been developed in the literature. 

Mostly, the error detection/removai phenomenon has been 
described by the exponential or s-shaped SRGlis. The s-shaped 
error removal phenomenan can be attributed to error 
depending C6I! or to time lag between the failure due to an 
error and its subsequent removal ES3, Bittanti Ei3 
attributes s-shapedness to increased error removal during 
the later part of the testing phase. None of these models, 
however, describes the interface between independent leading 
errors and errors whose removal is dependent on these 
leading errors 
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In this chapter, we propose a new discrete SRGH assuming. 


the softiAisre 

contains tma types 

of errors. 

leading sod 

dependent A 

leading error is 

defined as 
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that is 

i sxftiad i at e r e- 

moved on its causing 

a failure. 

A 

dependent 


srrcr is tsmi 

ed as One 

whose removal is 

delayed 

until the 

c ct r r e sp on d i n g 

leading s 

rror is removed. 

The 

removal of a 

lead ing srrcr 

helps in 

isolating the cause of 

f ai 

lure of its 

c: Q r r s sp on d .t n g 

dependent 

error. Applicabi 

I i ty 

of 

the model 


has been shown by applying it to several software error data 
cited in l23 = 

Besides, madeiling a software error detection process, it is 
also of utmost importance to know when to stop testing and 
release the software for use. Several criteria have been 
suggested in this regard r3-5, 72. In this chapter, we also 
discuss a release policy for the proposed discrete SRGli by 
minimising cost subject to discrete failure intensity not 
exceed ing a specified value. We first estimate the 
parameters of the proposed SRGH by the method of maximum 
likelihood using software error data cited in C23. Using the 
estimated values, we discuss the optimal release policy 
based on cost and intensity criteria. Results are 
illustrated by numerical examples. 
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2.1 ; ASSUMPTIONS : 

1. Scftjijars is subject to failures at random test runs 
caused by errors remaining in the saftnsare. 

2. The error removal phenomenon is modelled fay NHPP , 

3. yhen a failure occurs, an intermediate effort is made to 
detect the error causing the failure and remove the error, 

4. The errors in the softsAiare are divided into two 
categories s Leading (Independent) Faults Dependent 
Faults 

5. The number of errors in the softniare is finite and is 
the sum of leading and dependent errors. 

6. The expected discrete failure intensity for leading 
errors is proportional to the current remaining leading 
errors. 

7. The expected discrete failure intensity for dependent 
errors is proportional to the currant remaining 
dependent errors and the ratio of leading errors 
removed to the total number of errors. 

8. The error removal process does not introduce any new 
errors in the software. 

9. Software life cycle is assumed to be more than the 
optimal number of test cases/runs before releasing the 
software. 
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the 


-F: r s is r: e v s* r re 1 s ased i thou t t ss t i ng * 
spcrdlng to the error detection pfisnaineriQn at 
'ac tursr/user ' end ^ there eKists an equivalent erro 
ition pfienofflenoii at the ussr/inanuf acturer end. 


NOTATIONS 


SMpectsd initial error content in the safti-iara. 
a.-^’O »• 

expected initial leading (dependent) erroi 

content in the software, a >0« a >0» a=a +a * 

^ 1 ^ 2 ' ±2 


proport ionai ity constant 


lead ing 


(dependent) errors, 0<fa<l, 0<c<i* 

proportion of leading errors in the software, 

0<pil, a^=p.a 

Test run lag between removal of leading and 
dependent errors* N2:0. 


m (n'Cm Cn)) 


mean value function for leading (dependent) 


errors, m {n)=0 for n:^0, m (oo)=a , 
’ 1 ’ 1 1 ^ 


m (n)=0 
2 


for n:^N* m (oo)=a . 

2 2 


m (n ) 


X (n ) 


ffi (n) + m (n) 
1 2 


failure intensity for m(n)(X(0) = 0). 


C CC ) 

1 2 


Cost of fixing a leading error before (after) 
release of the software, C„>C >0. 


b afore 


3 



n 


5 


X 

o 

n 


Cost of fixing a dependant error 
(after) release of the software, C >C >C5. 

’43 

Cost- of a test run. 

Total expected software cost incurred during 
softMiare life cycle, when the software is 
released after n test runs. 

Software life cycle in terms of number of 
runs. 

desired failure intensity. 

optimal number of test runs executed before 
releasing the software. 


2.3 : MODEL ANALYSIS : 

Ths ouffiber of leading errors removed on the Cn-i-i ^ test run 
as per assumption C6) may be expressed as 

m Cn+i ) = m in} = bCa -m Cn) ) .. <2* i ) 

i i . i 1 

solving (2*1)^ under the initial condition' m- CO) ■ = 0^ we 

have 

m (n) = a Cl-Cl-b)"') ...... <2. 2) 

i 1 

equation (2.2) models the leading error removal phenomenon. 

The failure intensity for leading errors is given by 

a b(i-b)" 

i 

it may be noted that failure intensity for leading errors 
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ns fiLiiBlssr OT indspsnden'fc srrors rsmovscl Qn "fclie Co"^!)^ “fessi 


according to assumption C 7 ) is expressed 


m \ n } 
z 


} = C C a “"“HI C n ) ) 
2 2 


1 


C 2 » 3 ) 


'irig ^ under initial condition m ( 0 ) = 0 « we aet 

2 7 3 


n-H 

m (n) = a i - II 
2 2 

X =0 


b - 


— Cl-Xl-brj 


(2.4) 


The failure intensity for dependent errors is given as 


- Il-(l-b)* 




It may be noted that failure intensity for dependent errors 
incr-easss for all n C >W ) satisfying 






md then decreases, 


l^lOW M 


II Cn) - m (n) m (n) 

i 2 


a Ci-b 5 - a II i 
1 2 




for simplicity, we assume test run lag to be negligible 


i*s» M-Om so 


n f Ca . 

HiCn) = a — a'(l~"b)^ - a II' '■ -i ■ j i 

1 2 a I 

x=o , '• 


C 2 * 5 ) 


* TiC 

i - pCl-fa)’^ - <l-p) II 1 - pc fi-d-b)’"! 

X = 0 ^ 


( 2 . 6 ) 



expected nujnbsr of 


equation (2.5) or (2.6) represents the 
errors removed in test runs. 

The failure intensity for m(n) is 

X(n-t-i) = mCn+i) — m(n) 


a fa (1- 

i 


-b) 


a Ca 
2 1 


l-(l-b) 


n+i 


n 

II 

X = 0 


Ca 




,7) 


It may be noted that either X(n) decreases for all n>l 


increases for n £ n and decreases for n n where n C 

X XX 


>1 


) satisfie-; 


XCn -i) < XCn ) > X (n -H ) 

X X X 

iuBu^ Kin} is inaxiniufn for n = n 

X 


2.4 : PARAMETER ESTIMATION : 

The proposed mean value function m<n> has four 
unknown parameters. To estimate the parameters, we use the 
method of maximum likelihood. 

Suppose, data is available for k observed pairs 

Ci=i ,2, . . . ,k ) 5 where y^ is the cumulative number of faults 

removed by n. test runs CO < y < y — ~ ^ y, ) 

■‘i-'z ■'k 

CO<n <n n ). The likelihood function for the unknown 

± 2 k 

parameters with mCn) in C2.6) is 
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Lla^bjCsP I (n. 


n>] 


SKp 


-fri I 


k 

II 

-4 i =: 1 


m C n . ) - IB C n 


] y -y. 


K St s; \ al 


<y. ~ y. ) 

t L -1 


iisith = 0 . Taking Log of (S) , we gat 


In 


[- 


a,b 5 C ,p I <n, ,y. ) 


-m 


'^.1 H 

L =1 ■ 1 - 

k 

2 ! 


tn. ) - ai Cn > 

V t-± 


(2.9) 


i. =1 


From (2.9), niaKimum likelihood estimates of a,b,c and p are 
obtained using DNCONF subroutine of IliSL HATH Library, under 
the following constraints 

0 < a < CO 

O < b < 1 

0 < c < 1 

0 < p < 1 

We have applied the proposed model for the following four 
real discrete software failure data sets cited in C23 
DSi - The failure data is for a command, control and 
communication system software tested for twelve 
months. During this period 2657 errors were 
removed. 
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DS2 “ The software failure data is for a command and 
control system software. The software was tested 
for fifteen weeks and 1138 errors were removed. 

DS3 — The data is for a communication and control system 
software tested for fifteen weeks during which 
period 1483 errors were removed. 

DS4 - This data set is also for a command and control 
system software tested for fifteen weeks and 2702 
errors were removed. 

The following table gives the maximum likelihood estimates 
of the proposed model parameters a,b,c and p for the four 
data sets described above : 


ESTIMATES OF 


DATA SETS 

A 

B 

C 

P 

DSi 

3115 

0.1642 

0.2104 

0.7929 

DS2 

1385 

0 . 1339 

0.0743 

0.8555 

DS3 

2562 

0 . 4407 

0.2533 

0.1855 

DS4 

3942 

0 . 0742 

0.0 

1.0 

From the 

estimates 

of fa , c 

and p obtained for 


failure data sets, it is observed that OSl and DS2 have a 
high percentage of leading errors, while DS3 has a low 
percentage of leading errors and DS4 does not have any 
dependent errors and all the errors are leading. The model 
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for DS4 data set reduces to a discrete exponential model » 
From the sstimatss obtained for the parameters, it is 
svidant that the softujare either may not contain any 
dependent errors or it may contain a varying proportion of 
dependent errors. The above mentioned data sets simply 
justify the existence of such a discrete a software error 
removal phenomenon. Figures i to 4 show the graphs of actual 
and estimated failures for the above data sets. 


£.5 : OPTIMAL SOFTWARE RELEASE PROBLEM : 

It is very important for the software manager to know 
when to stop testing and release the software for use. 
Several researches have studied this problem based on 
different criteria for release t4— 5 and reference cited 
there ini. In this chapter, we obtain the optimal number of 
test runs expected before release such that the total 
expected software cost during the life cycle of the software 
is minimised subject to discrete failure intensity not 
exceeding a prespecified value. 

Fiathematically, we may say 
Minimize G<rw = 

C m Cn) + C fm (n, ) - m Cn) 1 + C m <n5 + 

11 zt 1 I i J 3 z 

C fm (n, ) - ffl (n)l + C_n .<2. 10) 

4.1 z I 2 J = 



subject to 


X(n) < X 


o 


( 2 . 11 ) 


From (2.10) 


(n+l ) 


C(n) = -CC -C ) ffl (n+1) 


2 1 


m (n) 1 

i J 


(C -C 

3 4 


f 

\ 

s 


m Cn-^1) — m in) \ ^ 


for simplicity, assumino C = C and C = C - we have 

3 i 4 2’ 


CCn+1) - C(n) = -(C -C ) X(n+1) +C ...... (2.12) 

2 i 5 

To study the behavior of the cost function, we consider the 


following cases 

ay When X(n) is decreasing for all n 

if C > (C -C ) X(l), minimum cost is achieved for n=0, 
else 

If C = (C -C ) X(l), minimum cost is achieved for n=0 

5 2 1 

and 1, else 

if C < (C -C ) X(i), and there exists n (>1 ) 

5 2 1 1 

satisfying 


(C -C ) XCn -1) 
2 1 1 

when n = n -1, 
if C < (C -C ) 


5 2 1 

such that 


(C -C ) X(n -1) 

2 1 1 

C(n) is minimum 


> C > (C -C ) X(n ) , cost is 

5 2 1 1 

else 

X(l) and there exists n^Ol) 

> C = (C -C ) X(n ) 

5 2 1 1 ■ ■ 

for both n=n and n=n^-l 

1. . i 


minifniiiii 


ay I4hsn. XCn) increases far n < n Oi) and decreases for 

X 

n>n such that XCn -iXXCn ?>X(ra +15 

X XXX 

ifC S: CC — C ) Xln minimum cost is achieved for ri=0» 

5 2 i X ^ 

else 

ifC < CC-C ) XCn C < iC -€■ ) XilY .and . there 

exists n >n such that 

1 X 

CC —C ) XCn.-*l) ,> C > CC '-'C') X'Cn )'»' CCn) is miniiBtiiB 

2 i i 5 2 1 1 ^ ■ 

for ri=n-l:j, while if there exists n >n such that 

1 lx 

<C -C 5 X<n -i) > C = CC ~C ) XCn >» CCn) is minimum 

Zl 1 5 21 i’ 

for both n=n and n=n -1 , else 

if C < CC -C ) XCn 5, C > CC -C 5 XCl) and there 

5 21 X 5 21 

eKist n <n and n >n such that 

2 X lx 

CC -C > XCn -15 < C < CC -C ) XCn 5 and 

2 1 2 5 2 1 2 

CC -C 5 XCn -1) > C > CC -C ) XCn 5, 

2 1 1 5 2 11’ 

CCn) increases for n<n -1 and is minimum for n = » 

2 1 

else if CC -C ) XCn -1) < C < CC ~C^ ) XCn„) and 

21 2 5 212 

(C -C ) XCn -i) > C = <C„-C ) XCn ), 

2 11 5 2 11 

CCn) increases for n^n —1 and is minimum for both , n=n 

2 1 

and n=n -1 . 

1 

Now, for a specific intensity requirement X^ OO), 
if XCn) is decreasing for all n and XCl) > X^, these 

exists n. >i such that 

t 

XCn.-i) > X > XCn. ), else 
x. ox. 
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and 


X(n) is increasing for n n and decreasing for n 

X 


X(n )>X , these may exist n and n 

X O S 4 


CO<n <n <n ) such that 

3x4 


X 5 X Cn ) > X 


X(n ) < X , XCn -i ) > X 


^ofubining the cost and intensity requirements^ me may state 


=QlIoisiing theorem for optimal release policy CassumiiiQ 


nicLie n exists for minimum cost) theorem* Assume C >C >0. 
^ z i * 


« r \f\ 
5 


Ca) XCn) is decreasing for all n > 1 


(i) XCl) < X 


if C > (C -C ) XCi), n =i 
5 2 1 


if C < CC -C ) XCl) and there exists n such that 

5 2 1 1 


CC -C ) XCn -1) > C > (G -C ) XCn ), n =n -1 

21 1 5 21 l’ 1 


(ii) X<i)?.fX and there exists n,>l satisfying 

O I 


XCn -1) > X > XCn. ) 

V O I 


if C > (C -e ) XCl) , n =n. 

5 2 1 t 


if C < CC -C ) XCl) and there exists n^ such that 
5 2 1 1 ^ 


CC -C > XCn -1) > C_ > CC -C 5 X Cn ) , 

2 1 1 5 2 11 


n =fnax Cn. ,n -1 ) 

t ’ 1 


Cb) XCn) increases for n<n and decreases for n>n 

X "X 


C > CC ~C > X Cn ) 

5 2 1 X 


f XCn ) < X. , n =1 





if XCn^)>X^ and there exists satisfying C 2 . 13 ), 


if and there exists satisfying (2.14), 

n ==n 

4 

Cii) C <(C -C ).X(n C < CC -C > Xili and there 

Z X X S 2 1 

exists n >n such that 

I X 

(C -C ) XCn -1) > C > CC -C ) XCn ) 

2 1 1 5 2 1 1 

if XCn ) < X , n*=n -1 

X o 1 

if XCn ) > X and there exist n and n 

X O 3 4 

satisfying (2.13) and (2.14) respectively, then 
if n > n +1, n*=n-l, else 

14 1 

if CCo ) < CCn >5 
if CCn ) > CCn ), n^=n 

3 4 ’ 4 

Ilf. 

if CCn ) = CCn ), n =n or n 

if X(n^) > X^ and there exists satisfying 

(2.14) then 

He 

if n > n Hhi n -1 else n -n 

±4^1 4 

any C < CC '--C ) XCn )« C > CC —C ) XCl) and there exists 

5 2 1 X ^ 5 2 1 

n <n and n >n satisfyiriQ 

X i X ^ 

CC —C 5 XCn < C < CC'—C )' "XCn ) -and 

2 12 SS 2 1' 2 

CC -C ) XCn -I) > C > CC --C ) .-'XCn ) ■ ■ ■ 

2 1 1 5 2 1 1 ■ 

if XCn ) < X 

X o 

if CCl) > CCn-1), n*=n -1 

II 
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if CCi) = CCfi-i). = l or n —1 
1 ^ 1 

if CCi) < C(n -1), n*=l 

if and there exist and satisfying 

C2«13) and C2«14) then 

if CCi) > CCn ~i) 

± 

if n > n +1 M n^=n -1 

14 ^ 1 

if n < n -1 or Cn. > n -1 and C Co > > CCD) 

3 2 3 2 3 


if 

CCi ) 

< CCn ) , n =i 

4 ’ 

if 

CCi) 

= CCn ) , n =1 or n 

4 4 

if 

CCi ) 

> CCn ) , n =11 

4 ’ 4 

if 

C C o ) 

3 

< C Cn ) , n =n 

4 ’ 3 

if 

CCn ) 

3 

> CCn )« n =n 

4 ’ 4 

i f 

CCn ) 

= CCn n =n or r 


3 4 ^ 3 4 


if CCI) = CCn -1) 

1 

if n > n 4-1 „ n =n -I or i« else n =1 

i 4 ^ i ^ 

if CCI ) < CCn -1) , n*=l 

i ^ 

if XCn ) > X and there exists n >n ' satisfying C2«14) 
then 

if n > n +1 , n =n -1 else n =n 

14 * 1 4 

Other cases can be similarly discussed 
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b=o«i; 

discuss the cptiirsal raisass policy far the SQftuiare systein 

described by BS2« We assuins C C =i0» C =40- n =150 ■ and 

X =*999a Using these vsIusSm mB have n =27 CCCn) is miniiBun 
o ’ i 


for n=26) and n^=49. Using CtiO of -Ca) in the theorem^ wb 


gat rr^max. Cn. ^ n-~l ) 

= 49. The intensity for 

n=49 

is 

«995 

and cost is 894S^4« 

If only cost was to be 

min 

i ffi 1 z ed ^ 

m 

o 

i-iQuId have been 26 ^ 

cost as 837S«3 but intensity 

IsIQU 1 d 

have 

bean quite high CS» 

0) * Figures 5 and 6 show 

the 

graphs 

Qf 


cost and intensity functions rsspscti vely « 

:.7 : CONCLUSION ; 


In this 

chaptar^ we have 

proposed 

a new 

discrete 

SRBH . At times 

discrete SRGIis are 

' mors suitable to 

describe 

software error 

detect ian/removal 

phenomenon 

than 

continuous 


SRGMs. In the proposed model the assumption of error 
independence has been relaxed. 

Moreover, the proposed model can cater for various types of 
software growth modelling from pure exponential to highly 
s“shaped. Thus the proposed model can be applied to 
different 


testing environments 














CHAPTER-3 


DISCRETE IMPERFECT DEBUGGING SOFTWARE RELIABILITY GROWTH MODEL 

3.0 : INTRODUCTION : 

In this chapter ws propose a discrete software 
reliability growth model based on Non Homogeneous Poisson 
Process to describe the fault resBOval phenomenon under 
imperfect debugging environment. The learning process is 
taken into consideration by assuming that the probability of 
imperfect debugging phenomenon is independent on the faults 
remaining. The model has a flexible structure as it can 
describe different growth curves ranging from exponential to 
highly S—Shapsd. The applicability of the model is shown by 
applying an data obtained from different software 
development projects. 

Software Reliability, Growth Models CSRSMs) are generally 
classified into two groups. The first contains the models 
which use the execution or calendar time as a unit of fault 
detection (Removal) period and such models are called 
continuous time models. The Second group contains models 
which use the test occasion (Cases) as a unit of fault 
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dstsction (Removal) period and such models are called 
Discrete SRSHs. A test occasions (Case) can be a single 
computer test run or a series of computer test runs SKecutsd 
in an hour, day, week or even month. The test occasion 
(Cass) includes the computer test run as well as length of 
time spent to visually inspect the software source cads. 
Brooks and Motley Ell, whereas a computer test run is a set 
of software input variables arranged in a certain manner to 
test the functional performance of a particular part of the 
software system. A large number of models have bean 
developed in the first group while fewer are there in the 
second group Yamada and Osaki E81 proposed two discrete 
SRSM's, Kapur et al Z6j proposed a general discrete software 
reliability growth model based on the assumptions that 
softu?are may contain several type of faults. In all these 
models the fault removal process (Fault Debugging) is 
assumed to be perfect i.e. when an attempt is made to remove 
a fault, it is removed with certainty. This assumption may 
not be realistic. Due to the compleKity of the software 
system and the incomplete understanding of the software 
requirements. Specifications and structure, the testing team 
may not be able to remove the fault perfectly and the 
original fault is replaced by another fault. The new fault 
may generate new failure when this part of the software 
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sy'-H-teiTj i-B trsnsvsrssd during the testing. The fault can be 
re'-'fiOv sd perfectly when the tasting team properly understand 
the nature of the fault and takes the necessary steps to 
rasrove it. The .nultipls raffioval of the original fault and 
their successors i.e. the fault which replaces the original 
fault, slews downs the removal of the original fault and 
gives rise to S-Shaped Srowth Curve. The concept of 
imperfect debugging was first introduced by Soel £23. He 
introduced the probability of imperfect debugging in J.H. 
Model £43, Kapur and Sarg £53 introduced the imperfect 
debugging in Soel and Kumoto £33 Model. They assumed that 
the fault reffioval rats per remaining faults is reduced due 
to imperfect debugging. Thus the number of failures observed 
by time infinity is more than the initial fault content. 
Although these two models describe the imperfect debugging 
phenomenon yet the software reliability growth curve of 
these models is always SKponential . Moreover, they assume 
that the probability of imperfect debugging is independent 
of the testing time. Thus they ignore the role of the 
laarrsing process during the testing phase by not accounting 
for the experience gained with the progress of software 
testing. Actually, the probability of imperfect debugging is 
supposed to be maximum in the early stage of testing phase 
and is supposed to reduce with the progress of testing. Xia 




st al £71 also proposed an SRGM considsrino the role af 
laar’-ning process in the sdacatian of the probability of 
isTiperfect debugging* This model is based on sound 
assuifipt ions but the dsterininat ion of. its parameters requires 
e>^fcr5 in“"ormat ions such as initial %^alue of the probability 
af perfect debugging and the value of the learning factor. 
This information requires collection of eKtra data to use 


mDC! e i. s * 


In this chapter^ me propose a discrete time SRBM based on 
f%fon-“HQmQgsnsous Poisson Process CNHPP) to describe the fault 
removal phenomenon under imperfect debugging anvironment » 
The 1 earn i.ng process is taken into consideration by assuming 
that the probability of imperfect debugging is dependent on 
the number of faults remaining (Removed)® The model has a 
structure and can thus describe different growth 
curves* Further^ the model is tested on a real software 

various software development 
cited from Brooks and Motley C13« 


fault data obtained from 
projects® The data sets are 
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3.1 : ASSUMPTIONS ; 


The fault rsmoval/f ai lurs observatian phanofnenofi 


follou^s Non-HGmQQeneaus Poisson Proosss CimHFP)* 

The Softisiar^e Systsu is subject to failures at randorn 
times caused by softixiare faults remaining in the 
software » 

The a:Kpectsd nLimber* of failures observed bsti^aisan the 

y I*'} t It 

n' and Cn-i-iJ’ test run occasion is proportional to 
the sKpectsd number of faults remaining in the 
software. 

On tha observation of a software failure, the efforts 
to remove the cause of the failure (the fault) may not 
be perfect and thus another version of the fault may 
replace the original fault. 

The rate of imperfect debugging is decreasing with the 
testing time and is proportional to the number of 
faults remain.ing in the software. 

The imperfect fault debugging does not increase the 
initial fault content. 


3.2 : NOTATIONS : 

a = The initial fault content in the beginning of the 
testing. 

b = The Removal rate per remaining Fault. 
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The Initial isupsrfsct debugging rate per fault. 

The E;<pscted Hsan ^4umber of Original faults removed 
by the test occasion <P.un). 


3.3 : MODEL ANALYSIS AND FORMULATIONS : 


unoer assumptions 


the expsctsd numbs; 


Ih 


of original faults removed between the n and (n+lJ 
runs ■ satisfies the following difference equation 


, Ih 


m (n) = b 

r r- 


a m C n y 

r 


n-cfa— m <n+i ) a— m <n) 

J J _L_ I L J 


t est 


1 ) 


The first term b |^a - represent the intensity of 
faults debugged^ while the negative term represents the 
intensity of imperfectly debugged faults. In other words, 
the intensity of faults removed is the intensity of faults 
debugged minus the SKpected intensity of the imperfectly 
debugged faults. To elaborate further, the initial imperfect 
debugging rate c is decreasing in the proportion of 

Therefore, the 


-m Cn+1 ) 

— as the testing progresses, 


remaining fault content (a-m^<n)) is imperfectly debugged at 

[ a-m Cn+i)-! 

L j.As the imperfectly debugged faults 

spawn new version of their own, consequently, these faults 
will generate more faults. Solving (3.1) using m^CO) = 0, we 
get 


m (n) = a 


(b-c> (l-(l-b> ) 

(b-c) + cCl-b)” 


Considering ip 


(b-c ) 
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im g 'Q s "fc 


01 I II .1 
r 


c Ci-Ci-b) ) 


i « jL. .* 


i + ?> (1— b>*^ 

From (3.2) , we can see that m (oo) = a. This indicates that 

r 

the original faults are removed completely after a long time 
of testing. 


3.4 : PARAMETER ESTIMATION : 

The fiaKimuffl Likelihood Estimation Cli.L.E) 
method is used to obtained the parameter estimates of the 
model given in (3,2). As the fault removal data used in this 
chapter are given in the form of pairs ^ 


Ci = l 


k) where x, is the cumulative number of faults 


removed by (n , ) test occasion 0<n<n , <n, . 

•' 1. J. k 


^ CmCn.) mCn. 1/3 


X . -5< , 
t l~i 


C— mCn, ) ) 
e k 


The likelihood function is given by 

k 

II 1 

L a,b,c (! 

1=1 

The parameter estimates are obtained by maximizing (3,3) 
with respect to each of the parameters. The I5MCQk4E 
subroutine of IHSL HATH Library is used to maximize (o.o) 
and obtain the parameters estimates. 
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DATA ANALYSIS : 


To check the validity of the suodel, it is 
tested on two data sets cited from Brooke and Motley Cll. 
DS-i s 

The data is given in the form (n,,x,)s 
£=lj 2 ,.x= 5 l 2 and the number of faults detected by the 12^^ 
test occasion is 2657. The estimated values of the . model 
parameters are 

A A A 

a = 3169, b = 0.148, c = 0.0173 
The proposed model estimates presence of imperfect debugging 
phenomenon in this project. This rate is much lower than the 
fault debugging rate per remaining fault (b). The fitting of 
the model is graphically illustrated in Fig. (1). It is 
clear that the model fits the fault the data excellentiy. It 
may be noted that the relationship between the cumulative 
number of faults and the test occasions is exponential. 

DS-2 ! 


The software fault data is given in the form 
(n.jX.)? i=l ,2, . . . ,35 , the number of faults detected by the 

Li, 

35^^ test occasion is 1301 « The estimated value of the 
models parameters are 5 


a 

= 1325, fa = 0.181 

, c = 0.172 


The proposed 

model estimates 

the presence of 

imperfect 

debugging in 

this project. The 

initial value of 

imperfect 




d SwtAQQ 2. HQ XS clO'saS "feHs V-SltAS O'f CIj ) = 

This indicates that the imperfect debugging had a 
significant impact on the progress of the test at the early 
stages of the tasting. This hypothesis is supported by fig. 
C2' as it is clearly seen that the relationship between the 
cUiTsuIative number of faults removed and the test occasions 
is not eKponential (as the case in DS-1 Fig, (1)) but 
S— Shaped, In other words, the S— Shapedness in the 
reliability growth is attributed to the significant presence 
of 5,mP'erfect debugging phenomenon. Further, fig. (2) 
graphically illustrates the goodness of fit of this model. 
It is claariy seen that the proposed model fits the observed 


fault data excellently. 
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CHAPTER - 4- 


baysian estimation for the generalised three parameter 

GAMMA distribution WITH TREE DIAMETER DATA 

4-.0 : INTRODUCTION : 

Samma distribution is also useful in many 
applications like reliability analysis, forestry and tree 
diameters, i^e have used the three parameter generalised 
Samma density to model the distribution of tree diameters in 
^forest stands. Sree E.J. et al (1994) considered Baysian 
estimation approach for three parameters l»4eibull density to 
model the distribution of tree diameters in forest stands. 
As uje knoui, Weibuil density is very much used in many 
applications such as reliability analysis, forestry and tree 
diameter data. But Gamma densities is also equally good for 
such applications. However, Krug, Nordheim and Siese Ci9S4) 
has used l*Jeibull density in modelling tree growth, 
survivorship and height distributions. Smith and Haylor 
C1987) find it difficult while estimating the parameters of 
Weibull density with reliability data using maKimum 
likelihood estimation method. The estimate obtained for the 


as 


c a t i on 

pararnet 

sr is negative. Moreover, 

w s 

i-jill get 

gat i vs 

sst iiiiats 

for location parameter m 

the 

case of 

d&IIing tree di 

■sfne ter d i st r i bu t i on . 




4.1 : MAXIMUM LIKELIHOOD ESTIMATION : 


Let ys consider generalised 


S aiTiiTs a d i s t r i by. t i an as 


/ 





lAihere 


& 


a 




is a location paraaister 
is a scale parameter 
is a shape parameter 


three parameter 


......( 4 . 1 ) 

: X > e 

i 

e ,9 > o 


To obtain ths fnaKimuiTi likelihood estimates of & n& and ■ & n 

1^2 B ^ 

we differentiate the iog-I ikel ihood function with respect to 
Ci = i 52 ^ 3 )« Let .X «X be a randam sacnpls of size n 

drawn from ths above distribution « The likelihood function 

is gi verf by 
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I 


n 


[^r ca )] 

^2 3 


-& 




TT|(-^] 

1. S=i 2 ^ 




ns } 

3 


K,>e 

L ± 

S ^S >Cj 

1 2 ^ 3 


n C>^. ) 

1 n 


. l~~i 


^ • - 0 
e L =4 2 


tf, ^ -■ 


■ TT f ^ 

II I _. I 

I -1 ^ 2 

Taking logarithm on both sides of (4,2) the log-likelihood 

fundrion is given by 


Log^L loge^- J 

i=i 2 


n lag rc© > 

3 


(0^-1) |log <;<^-e^) - loQ^e^j- 


C '^ #» tj ^ 


Differentiating with respect to 0,0 


and 0^, we have 


n (0-1 ).<-!) 
_ n Y 3 

0 2 


'e i’ 2 ’ 3’ 


00 

1 

0 log^L C0^,0^,0^) 
0^~ 


n 

2 

n8 


--. © ' 

i- =i L ± ■ 

n Ck. -0 > n (0 -1 ) 


(4.4) 


t 1 


1,=1 0' 


2 


0 


O 


.2 


n (X.-0 ) 

■L i 


^2 ih 0® 
2 


= 0 


(4.5) 


0 log L (e ,6 ,dj n 

___ ^ , II. . 2 3 


0 -± 


00 


= 2 loQ (K,-0^)- ^ log 0 


n 


1. =i 


t =1 


2 . z, c a -1 ) 

ts=l 3 


0 


C4-6) 
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quafcion C4.5), we have 


nO 


e 


n ;< - rxB 


B 


= 0 


n& & + HK — nB = 0 

2 3 1 


or 


B = 
1 


B B 

2 3 


C4.8) 


From Equation (4) , we have 


log U iii.-B y - n log © = 0 

i =1 


1 OQ 0 = ■ log 1 i i'A —B 

62 n ^»li i 
i = i 


) 



(4.9) 


Since equation (4.7) and equation (4.9) does not provide any 
explicit solution for (i'=l,2,3). Therefore, the 
maximum likelihood estimates can be obtained by iterative 
scheme . 






42 : Bkwsmn MODEL : 


Although ysibull distribution is generally 
used in a large nuniber of diameter distribution but mb have 
considsrecl in this model generalised three parameter gatifna 
distributiaru In such types of problems investigator do not 
often reports the parameter estimates from individual 
sampiesn So data based priors are not readily available* 


Thus Me choose 

Vague pr 

iors 

^ of 

B ^ B and B % As 

& and & 




12 3 

2 3 

are contained 

to be pos 

itivs^ we adopt Jeffreys 

Ci96i) prior 

for positive 

p s r am e t e r 

B 

ano 

& . 



2 


3 


i . e . 

g (B 

} a ■ 

X 

2 


...... (4.10) 

and 

Q i& 

^ 3 

) a 

•f 

jk. 

B 

B 


(4.11) 

Regarding the 

parameter 


Me 

knoM that it 

is in the 

1 n t e r V a 1 fu « x 

1. yhere 

K 

is 

the first ordei 

statistics 

L ‘iU 

<i> 





i«8« the .minimum diameter in the sample* Therefore^ ins 
choose the prior for the parameter to ,bs uoifarmiy 

distributed 


1 * s , 


The condition B < 

i 


Q Ci9 ) = — ix »»*»«« 

B ^ i <i> 

A 

K ensures that B Mill lie 

it> i 


<4.12) 
in the' 


interval j ■* baysiar* model is .completed by 
specifying the prior distribution given in (4.10)^" C4.1i) 
and ,<4.12). 
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The joint posterior distribution is then obtained by 
applications of Bayes Theorem 


where 


r 

L a 




%e ,©^,e 


.] 


IS 


the 


likelihood function 


and 


isr , 


we 


joint prior for O ^0 and » Forth 

i ^ Z 3 J 12 3 ' 

will take the priors for 0 ^6 and 0 to be independent where 

12 3 

as dependencies sn the posterior arises due to the sample 
data. 

The full Baysian model is given by 



X > 0 > O 


<1> i 

0 > 0,0 > 0 

2 3 

The normalising estimate of the parameters for equation 
(4.13) are difficult to obtain analytical. However, we can 
apply stochastic simulation procedure to iterative scheme to 
estimate the parameter involved in equation (4.13). For 
instance, suppose that 0 .0 and 0 are our initial 
values of the parameters 0^, 0^ and 0^. We can also use 
Gibbs sampler which is based on iterative scheme. To make 
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use af Gibbs sampler uie must be able to sample from the full 
conditional distribution for each parameter. The full 
conditional for 6^6 and & are given by 

i ^ 3 




n X. 

s 


Cn6* -hi ) V 

a e ^ e 
2 


< 4 . 14 ) 




.. -a,.®. - 

B 2 


C4. 15) 


i -1 


n 


^ ^ n X. 

O' jL ^ 1. 

e 2 


. . < 4 . 16 ) 


^ ~± 


by 


Random numbers can be generated from 
drawing a random variable Z from Gamma distribution. 
Fallowing an initial guess at the value of each parameter, 
the Gibbs sampler proceeds by iteratively generating a new 
value for each parameter. if © , & and 0 are the 

lO ’ 20 30 


initial values, ^e generate 6 from 

® il 


& & '|0 ' 1 % 
± lO ^ 20 ^ ^ 


© 


from 


& 


then 
f ro0i 


finally 

I© .& sjt}. This constitutes one iteration of the 

3 ■ ±i ^ 21 J 

sampler. In the next iteration, initial values are replaced 
by those from the first iteration and values from the first 
iteration by those from the second iteration and so on- 
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4.3 : CONCLUSION 


j j cr i 


Ths main purpose hers is to dsiiicnstrata the 
Bayszan procedure for three parameter Samma distribution 
f.^jhzch IS bacter than any typical method found in literature* 
Tierical illustration can be done if the data of tree 
ution is avaiiable. It can be seen that the 
man imam 1 i ke 1 ihocd sst ifsiat ion is prob 1 smat ic i/ii th tree 
diameter distribution because of negative estimatas of 
,d. U’>». CX ‘if Mi. Ui i; CSi. s e* i -SJ.* -W- O i » Bays;! an model is easy to fit and 
r-ssults ars also according to our prior sKpactations. In 


particular, ujidely used adhoc rules such as Set = 0 
i^ihenever 6^<0 are necessary with Baysian inference- If the 
Samma density is to be used in a tree diameter distribution 


then prediction would be probably be made using the model 
values of the joint posterior sample of the parameters, 
whereas xn Baysxan approach- once the samples Have been 
obtained from the full posterior, these can be used to great 
advantage for simulating any predictive distribution of 
xnoerest- 


W.M.. - - - 
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CHAPTER-5 


A BAYSIAN ESTIMATION APPROACH TO PROPORTIONAL HAZARD MODELS 

FOR 

COVARIATES AND INSTITUTIONAL EFFECTS 

5.0 : INTRODUCTION : 

I nSb 3. 'tti'fe ion ax var’xa'cicjn is an iiBpQr’'tan'tr ■fac'fcor’ "to 
eKasixne in a randojisizscf clinical trials. in randosTtised 
clinical trials for coniparinQ treatiRents far disease such as 
cancsTj it is sofnetisises necessary to include patients "froni 
different institutions to compare the sample size in a 
reasonable period of time. One of the reasons to examine 
institutional variations is that the objective of clinical 
trial IS to try to drai^i conclusion about the overall effect 
of therapy in the population. Since the institutions are not 
selected at random and only a small random subset or the 
pjitients are entered on trials with substitutional 
institutional variation it would not be clear exactly what 
effect would be seen in the general papulation. Another 
reason to examine institutional variation is that it might 
bs possible to learn more about how the therapy should be 
given or to whom it should be given. There are several 
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trials* CksTiS a* id l^laksfisld Cl 990 ) use the ssnis type of 
hierarchies! Baysian Structure to model data from multi 


-ir S ! ^ 


response trials. Boos and Brousnie <1992) 

laXysino cata troir* rndltx 


P rQ| 

posed rank 

based 

methods for 

can* 

'ij s r 't r 1 si s 

iiiith : 

c on t X n uous o ■ 

wit: 

“'i a 1 1 f 1 e a r'* 

model 

structure . 


Stangl and Srasnhousa (1992) developed hierarchical Baysxan 
su.r%'‘ival fsodsl far sxamininQ institutional differences. Grey 
Ci.994) considered a Baysian analysis of institutional 


effects in a sT.ulti center cancer clinical trial. The 
■structure of their irsodel is quite different from the 
proportional haaard model used hare. 

5.1 : BAYSIAN MODEL : 

A proportional hazard model is assumed for 
institutional effects and covariates, let k. .. be covariats 

tjk 

K suls jset j from institution i » 


Let us assums that there are N institutions with n. 

h 

cases/pat i sots from institution i. and Cp'-i) covariatss with 
. as treatment vari ablest. 

L.JP 

Let O < t < t t be the boundaries of ticiie 

o i ■ m ■ 

intervals and set I^Ct) = ICt^ ^ < t < 
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BU’D i h 


a can bs un 


) Oi, I C t \ K /3 K 

j 4^4 ^ ^ ^ ijk k ii ' 'i jp 


''1 , . ^ ot I’ 3 ^ i ■” I 


!l. ^ jC ^ » m St m 


1 jw «. 1^. *1* 1 * ■» y\j 

X * i i V I ■.'^ , , ^ CX ^ 


ft ^6. ) = y Oi, I, (t)-S-d + y ;< ft X 
'■ , 4<, 5- *• io , Zr ijk' k ii i. 


2 , /3 = 


ana Lin k n ocAin pap aiOi a a pa 


inst i tLitional deviations fi 


uPsd e r 1 y i no 1 cq 


I rd s 


. ih 


effect fi « 
p 

Let us consider faur covariates x .k «k and x 

3 ^ 

i^iliere x s perfarfuaoce status 


agnosis 


ge in years 


^ricr toerapy 
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log hCtlK. )= ) oi, I, it} -f'd 4-k '^u ft 

^ ^..1 t l io ijl i ij2 2 


i;~i 


"^K. ^ ft -^K ft -^e % 

i.j3‘ 3 tj4‘ 4 ii ij4 

Let UB asBun^s the fol lending prior for the moda! 
ft, foIIoiajB doubis SKponsrtial distribution 

L 

distr-ibutic-'n probability density function 

•? ? ■"*< 5 

/ p ' « d. i * 


L. Bp 1 Bc e 


—(X)<ft<m 


Let is independent and identically riui t ivariats normally 

distributed i^iith mean zero, and variance unity. 

i i, d 


a. 


Q 


M C 0 ^ 1 ) 


, -eV2 

Ju t 

8 


/2^ 


and 


I * e = 


ai-ai_Jv ^ 

V A Y (jUjX) 
g < V ) = 


NCO. 


-i 


) 1 = 1,2 


,m 


-Xv u-i 

e V 5 v>0 


r (pi) 

The prior for a restricts the magnitude of the jump between 

adjacent intervals in a piece wise constant model - 

( 1 if subject Ci,j) is observed to fail in time 


Let <5. 


ijl 


interval i 
0 otherwise 


Let T .. =0 if the failure or censoring time 


The 


t jl - •* 1-1 

likelihood function corresponding to this model is given fay 
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n 


L '''• OC ^ f S' 5 


L. ) 


s- ? ^ . 

N i 


Ij I 

“f h <fc j K. ,,o(,/3,6!)du 
h Ct j K. . ,ot,/3,8. ) e o 


i “ i 3 — i 


H 


X y «, I, +K, . p +',i. . P +K. . 75 +(/5u+0 )k 

t-l ^ ^ ^ I-JZ 2 I ,)3 3 ‘ Ll ij' 

e -T. hCtix, .a,p,&} 

VJ I ' tj 


i = i j = ± 


n n m 

i 


yyiS. 19. -T. ^iji 

,ZZi vji vjl tjL e 


i =i 

N 


t =1 


j =u=i 


ei<p 


^iSL \il " ® 


^IJl 


(5.4) 


i&ihers Tjf. “04, +k. . /9 -^k. , /? -t-K. /ft (ft 4*© ) x. . 

l to tj±‘ ± tj2 2 tj3 3 * 4 1 tj|4 

The joint posterior is then proportional to 


C5.5) 


IM 


L.<a,/5,©.) qC0.) 


«- i=± 


Q<a[v) 9 <v) g(/5) 


m « B M ai 


. (5.6) 


where q(,) function are is the prior densities. This is not 
easy to compute directly but it is possible to generate 
samples from the joint posterior using Sibbs sampling. In 
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Gibb'S saiiipling observations are generated from the j Joint 
posterior distribut ion by sampling from full conditional 
distribat ions* See Self and and Smith <1990)^ Self and et ai 
Ci990)» Ze.jsr and Karim Ci99i) and Clayton Ci99i)* 

The parameters that are require to specify the prior 
densities are X and u, in the prior for Proper priors are 
used but the parameters are chosen to keep the priors fairly 
?>ieak« For institutional effects this justified since there 
is little prior information on the magnitude of these 


parameters 

for the 

pr 

iar V 

^ it is noted 

^4- i / ,t 

1 5 C3k V X V 

is 

the 

varxance of 

J UsBpS 

in 

the 

1 ag-Lind e r I y i ng 

hazards 

at 

the 

boundaries 

of the 

t 

ime 

intervalsa The 

question 

of 

the 


magnitude of -survival difference can also be addressed using 
predictive distribut ions* The survival curve for the 
predictive distribution for a neui case from institution i is 
given by the integral of 

i 

h C X I K. . -0. )du 

oivj = e 

over the posterior distribution lifhere hC») s given by <5*1) 
i^iith Gibbs sampling^ the integral over the posterior is 
calculated by averaging over- the generated parameter value • 
Data analysis has not been done because of non-'avai iabi 1 i ty 
of data« 
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5.2 ; CONCLUSION : 


This chapter deals usith a Baysian sstiasation 
procedure for study the amount c=f institutional variation in 

‘ial using proportional haaard 
ucture is used ?ijith prior for 


iliOG €r i. S' I 


. 'S 8 S 


d *1* “I" T 


cisnt-s as double ai-jpansntial distribution 


ar':w priwT i‘:d' X ns t X t u t i on a 1 oevxatxons & as standard 

iO 

multivariate- normal density ujith j-aean vector zero and 
vari anca-covariance matriK I. The prior for a restricts the 


magnitude of the jump betiijeen adjacent intervals a —a is 

I l-i 

isisd normal variate uiith variance 1/v. Further the prior 
for V is a Gamma distribution with parameter fj and X. The 
posterior distribution calculated using Gibbs sampling^ The- 
me tshods can not be applied to data from Lung Cancer trial 
because non-avai lab i 1 i ty of data. This study can be 
proceeded further by applying it to Lung Cancer trial data, 
ye can predict that there appears substantial variation in 
the treatment effect across institutian-s . Although the 
reason for- this have not been identified. It s-jould be 
possible to inve-stigate this further through a detailed 
examination of' the data from the institutions with eKtreme 
effects. 
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